Hackipedia hand edited PDF to text conversion. This file is UTF-8, please make
sure your text viewer supports it.
---------------------------------------------------------------
Introduction
------------
Word for Windows uses the same basic file format for its document, glossary, and
autosave files. This document describes the Word for Windows document format
with additional comments explaining the differences for glossary and autosave
files.
The most important sections of a Word document are the text and formatting
sections. The text section is straight ANSI text, although some of the low order
characters have been reserved for special use, such as forced line feeds and
page breaks. No formatting "reveal" codes exist in the text section. In a Word
for Windows document, formatting is stored in the special sections and related
to the text by sequential tables. Extracting textual information from a Word for
Windows document is very simple: read the ANSI text section and ignore the
formatting section.
A list of differences between the file format for Word for Windows versions 1.x
and 2.0 has been added to this document as Appendix A.
Table of Contents
Introduction ............................................................................................................................................... 1
Table of Contents....................................................................................................................................... 1
Definitions ................................................................................................................................................. 3
page (or sector):........................................................................................................................... 3
document: .................................................................................................................................... 3
file: .............................................................................................................................................. 3
CP (Character Position):.............................................................................................................. 3
FC( File Character position):...................................................................................................... 3
PLCF(PLex of Cps(or FCs) stored in File): ................................................................................ 3
piece table: .................................................................................................................................. 4
sprm (Single PRoperty Modifier):............................................................................................... 4
grpprl (group of prls):.................................................................................................................. 4
prm (PRoperty Modifier): ........................................................................................................... 4
full-saved (or non-complex) file:................................................................................................. 5
fast-saved (or complex) file:........................................................................................................ 5
FIB (File Information Block): ..................................................................................................... 5
paragraph..................................................................................................................................... 5
run of text .................................................................................................................................... 5
section ......................................................................................................................................... 5
style ............................................................................................................................................. 5
CHP (CHaracter Properties) ........................................................................................................ 5
CHPX (Character Property EXception) ...................................................................................... 5
PAP (PAragraph Properties) ....................................................................................................... 5
PAPX (PAragraph Property EXception) ..................................................................................... 6
table row:..................................................................................................................................... 6
TAP (TAble Properties): ............................................................................................................. 6
STSH (STyle SHeet) ................................................................................................................... 6
FKP (Formatted disK Page): ....................................................................................................... 6
bin table....................................................................................................................................... 7
SEP(SEction Properties).............................................................................................................. 7
SEPX(SEction Property EXceptions).......................................................................................... 7
DOP (DOcument Properties)....................................................................................................... 7
sub-document .............................................................................................................................. 8
field ............................................................................................................................................. 8
Naming Conventions ................................................................................................................................. 9
Non-Complex File Format......................................................................................................................... 10
Complex File Format................................................................................................................................. 12
File Information Block (FIB)..................................................................................................................... 14
Text............................................................................................................................................................ 14
Character and Paragraph Formatting Properties ........................................................................................ 16
Bin Tables.................................................................................................................................................. 17
Style Sheets ............................................................................................................................................... 18
SPRM Definitions...................................................................................................................................... 21
Complex File Format................................................................................................................................. 30
Algorithm to determine the bounds of a paragraph containing a certain character in a complex file31
Algorithm to determine paragraph properties for a paragraph in a complex file ........................ 32
Algorithm to determine table properties for a table row in a complex file.................................. 32
Algorithm to determine the character properties of a character in a complex file....................... 32
Algorithm to determine the section properties of a section in a complex file ............................. 32
Algorithm to determine the PIC of a picture in a complex file. .................................................. 33
Footnotes ................................................................................................................................................... 33
Headers and Footers .................................................................................................................................. 33
Page Table ................................................................................................................................................. 35
Glossary Files ............................................................................................................................................ 35
sttbfAssoc (Table of Associated Strings)................................................................................................... 36
Structure Definitions.................................................................................................................................. 36
BRC: Border Code ...................................................................................................................... 36
BRC10: Border Code for Word for Windows 1.0....................................................................... 37
CHP/CHPX: Character Properties............................................................................................... 37
CHP10/CHPX: Character Properties for Word for Windows 1.0 ............................................... 40
DOP: Document Properties ......................................................................................................... 42
DTTM: Date and Time (internal date format)............................................................................. 43
FIB: File Information Block........................................................................................................ 43
FKP: Formatted Disk Page.......................................................................................................... 49
FLD: Field Descriptor ................................................................................................................. 49
OBJHEADER: Embedded Object Properties.............................................................................. 50
PAP: Paragraph Properties .......................................................................................................... 51
PAPX: Paragraph Property Exceptions ....................................................................................... 54
PCD: Piece Descriptor................................................................................................................. 54
PGD: Page Descriptor ................................................................................................................. 55
PHE: Paragraph Height ............................................................................................................... 55
PIC: Picture Descriptor ............................................................................................................... 56
PLCF: Plex of CPs stored in File ................................................................................................ 57
PRM: Property Modifier ............................................................................................................. 57
PRM: Property Modifier (variant 1).............................................................................. 57
PRM: Property Modifier (variant 2).............................................................................. 58
SED: Section Descriptor ............................................................................................................. 58
SEP: Section Properties............................................................................................................... 58
SEPX: Section Property Exceptions............................................................................................ 60
TAP: Table Properties ................................................................................................................. 60
TBD: Tab Descriptor................................................................................................................... 61
TC: Table Cell Descriptors.......................................................................................................... 61
Appendix A - Changes from version 1.x to 2.0 ......................................................................................... 62
Changes to Structures.................................................................................................................. 62
BRC .............................................................................................................................. 62
CHP............................................................................................................................... 62
DOP .............................................................................................................................. 62
OBJHEADER ............................................................................................................... 62
PAP ............................................................................................................................... 62
PGD .............................................................................................................................. 62
PIC ................................................................................................................................ 63
SED............................................................................................................................... 63
SEP................................................................................................................................ 63
TAP............................................................................................................................... 63
TC ................................................................................................................................. 63
Other changes.............................................................................................................................. 63
Autosave Source ........................................................................................................... 63
Embedded Objects ........................................................................................................ 63
Hand Annotation ........................................................................................................... 63
New Sprm definitions ................................................................................................... 63
sttbfAssoc...................................................................................................................... 63
sttbfFn ........................................................................................................................... 63
Index of Changes from version 1.x to 2.0 ................................................................................... 63
Appendix B: Revision History................................................................................................................... 65
Definitions
-----------
page (or sector):
512 byte segment of a Word for Windows file that begins on a 512-byte
boundary. (bytes 0-511 are in page 0, bytes 512-1023 are in page 1, etc.).
In Word data structures, an unsigned two-byte integer page number is given
the acronym PN (for Page Number).
document:
A named, multi-linked list of data structures, representing an ordered
stream of text with properties that was produced by a user of Microsoft Word
file:
The physical encoding of a Word document 's text and sub data structures in
a random access file
CP (Character Position):
A four-byte integer which is the position coordinate of a character of text
within the logical text stream of a document.
FC( File Character position):
A four-byte integer which is the byte offset of a character (or other
object) from the beginning of the file. Before a file has been edited(i.e.
in a full saved Word document), CPs can be transformed into FCs by adding
the FC coordinate of the beginning of a document's text stream to the CP.
After a file has been edited (i.e. in a fast-saved Word document), the
mapping from CP to FC is recorded in the piece table (see below)
PLCF(PLex of Cps(or FCs) stored in File):
A data structure consisting of two parallel arrays that allows a relation to
be established between a certain CP position in the document text stream (or
FC position in a file) and an arbitrary data structure. It consists of an
array of n+1 CPs or FCs followed by an array of n instances of a particular
arbitrary data structure. In typical usage, the nth CP or FC of the PLCF is
in one-to-one correspondence with the nth instance of the arbitrary data
structure, with the n+1st CP or FC marking the limit of the nth instance's
influence. When a PLCF is used to record a partitioning of the document's
text stream or a partitioning of the bytes stored in a of the 0th mark or
link. To properly interpret a PLCF stored in a Word file, the length of the
stored PLCF and the length of the arbitrary data structure stored in the
PLCF must be known. The length of the stored PLCF is recorded in the FIB.
The lengths of the data structures stored in PLCFs within Word files are
listed later in this document.
piece table:
The piece table is a data structure that describes the logical sequence of
characters in a Word document and records recent changes to the formatting
of a Word document. It is stored in a Word file as a PLCF named the plcfpcd
(PLex of Cps containing Piece Descriptors).The piece table relates a logical
character number, called a CP (Character Position), to a physical location
within a Word file (an FC). The array of CPs in the plcfpcd defines a
partitioning of the Word document into disjoint pieces. The second array is
an array of PCDs (Piece Descriptors) which is in 1-to-1 correspondence to
the array of CPs that records the physical location in the Word file where
the corresponding piece begins. To find the physical location of a
particular logical character in a Word document, take the CP coordinate of
that character within the document and find the piece that contains that
character. This is done by finding the index of the largest CP in the array
of CPs that is less than the character CP. Then reference the PCD with that
index in the array of PCDs. The FC stored in the PCD gives the position of
the beginning of the piece in the file. Finally, add the offset of the
desired character from the beginning of its piece to the FC of the beginning
of the piece. This gives the actual file offset of the character.
sprm (Single PRoperty Modifier):
An instruction to modify one or more properties within one of the property
defining data structures (CHP, PAP, TAP, SEP, or PIC). It consists of an
operation code which identifies the field(s) to be changed, and an operand
which gives the value that a particular field is changed to or else which is
a parameter to a procedure which will change the field or fields. The
operand is omitted for sprms whose opcodes completely specify the values
that must be stored in the property data structure. A synonym used for sprm
in some data structure definitions is prl (property modifiers stored in a
list).
grpprl (group of prls):
A grpprl is a data structure that records a set of sprms. The 0th sprm is
recorded at offset 0 of the structure. Any succeeding sprms are recorded
immediately after the end of the preceding sprm . To traverse a grpprl and
locate the sprms recorded within it, it’s necessary to fetch the opcode of
the first sprm, lookup the length of the sprm with that opcode, use that
length to skip past the first sprm, fetch the opcode of the second sprm,
lookup the length of that sprm, use the length to skip the second sprm, and
so on. See the table in the “SPRM Definition” topic to determine the length
of a sprm. The phrase “apply the sprms of a grpprl (or PAPX or SEPX)” used
later in this document means to fetch the 0th sprm recorded in the grpprl
and perform the action for that sprm, fetch the first sprm and perform its
action, and continue this procedure until all sprms in the grpprl (or PAPX
or SEPX) have been processed.
prm (PRoperty Modifier):
A field in piece table entries that records how the properties of text
within a piece were changed to reflect user formatting operations. The prm
usually contains an index to a grpprl which records the user’s formatting
changes as a group of sprms. If the user has made only a small change to
formatting that can be expressed as a single 2 or 1-byte sprm, that sprm is
stored within the prm.
full-saved (or non-complex) file:
A Word file in which the physical order of characters stored in the file is
identical to the logical order of characters in the document that the file
represents. The text stream of a non-complex file can be described by an fc
(an offset from the beginning of the file) to mark where the text begins and
a ccp (count of CPs) to A Word file in which the physical order of
characters stored in the file does not match the logical order of characters
in the document that the file represents. A piece table must be stored in
the file to describe the text stream of the document.
FIB (File Information Block):
The header of a Word for Windows file. Begins at offset 0 in file. Gives the beginning offset and lengths of
the document's text stream and subsidiary data structures within the file. Also stores other file status
information.
paragraph
A contiguous sequence of characters within the text stream of a document that is delimited by a paragraph
mark, cell mark, row mark, or a section mark (These are special characters described later in this document).
run of text
A contiguous sequence of characters within the text stream of a document that have the same character
formatting properties. A single run may cross paragraph boundaries and may encompass the entire
document.
section
A contiguous sequence of paragraphs within the text stream of a document that is delimited by a section
mark or by the final paragraph mark at the end of a document. Users frequently treat sections as the
equivalent of a chapter in a book. The boundaries of sections mark locations where the layout rules for a
document (number of columns, text of headers and footers to use, whether page numbers should be
displayed, etc.) are changed.
style
A named set of character and paragraph properties that can be associated with any number of paragraphs in
a Word document's text stream. A style provides a set of property defaults for any paragraph tagged with
that style. When a new paragraph is created and given a particular style, newly typed text is given the
character and paragraph properties of that style unless the user makes an exception to the style definition.
CHP (CHaracter Properties)
The data structure describing the character properties of a run of text.
CHPX (Character Property EXception)
A data structure with the same form as a CHP but which has different semantics. It describes how the
properties of a run of text differ from the character properties of the styles of paragraphs that contain the run.
By applying a CHPX to the character properties (CHP) inherited by a particular paragraph from its style, it
is possible to reconstitute the CHP for the portion of the character run that intersects that paragraph.
PAP (PAragraph Properties)
The data structure which describes the properties of a particular paragraph.
PAPX (PAragraph Property EXception)
A data structure describing how a particular paragraph’s properties differ from the paragraph properties of
the style assigned to the paragraph. By applying a PAPX to the paragraph properties (PAP) inherited by a
particular paragraph from its style, it is possible to reconstitute the PAP for that paragraph. The PAPX
contains an STC (a style code to identify the style in control of the paragraph), paragraph height
information, and a grpprl which specifies how the style's paragraph properties must be changed to produce
sequences of paragraphs called cells. The last paragraph of each cell is terminated by a special paragraph
mark called a cell mark. Following the cell mark that ends the last cell of a table row, the table row is
terminated by a special paragraph mark called a row mark. When Word displays a table row, it assigns a
rectangular shaped display area to each cell in the row. All of the cell display area’s top’s are aligned at the
same vertical position on a page. The leftmost display area in a table row is assigned to the 0th cell of the
row; the next display area to the right is assigned to the 1st cell of the row, etc. The text of the cell is
wrapped to fit its display areas. As more text is added to the cell, the cell display area extends downward. A
set of table properties that determine how many cells are in a row, where the horizontal boundaries of cell
display areas are, and what borders are drawn around each cell in the table is stored for the row mark that
marks the end of the table row.
TAP (TAble Properties):
The data structure which describes the properties of a single table row. The information in the TAP for a
table row is stored in a Word file as a list of sprms that modify a TAP which has been cleared to zeros. This
list of table sprms is appended to the grpprl of paragraph sprms that is recorded in the PAPX for the row
mark that delimits the end of a table row.
STSH (STyle SHeet)
A data structure which represents every style defined within the Word document. The STSH records a
unique name string for every style and associates each name with a particular CHP and PAP. The indexes
used to refer to individual styles are called STCs (STyle Codes). Every PAPX for every paragraph recorded
in a document contains an STC which identifies the style from which a paragraph inherited its default
character and paragraph properties. CHPXs recorded for the text within the paragraph and PAPXs recorded
for the paragraph itself encode changes that the user has made with respect to the style’s default properties.
FKP (Formatted disK Page):
A data structure that fits in one 512-byte page that encodes either the character properties or the paragraph
properties of a certain portion of a Microsoft Word file. An FKP consists of four components:
1) a count of the number of runs or paragraphs described by the page.
2) an array of FCs recorded in ascending order demarcating the boundaries between runs or
paragraphs that are recorded adjacent to one another in the Word file.
3) an array of offsets within the FKP in one to one correspondence with the array of FCs that
locate the properties of the run or paragraph that begins at a particular FC.
4) a group of CHPXs if the FKP stores character properties or a group of PAPXs if the FKP stores
paragraph and table properties.
To find the CHPX/PAPX corresponding to a particular character in a document, calculate the FC coordinate
for that character. Then search the FKPs that encode the type of property you want to produce, to find the
FKP whose array of FCs encompasses the FC of the document character.
Then search within the FKP to find the index of the largest FC entry that is less than or equal to the FC of
the document character. Use this index to look up an offset in the array of offsets within the FKP. Add this
offset to the beginning address of the FKP in memory. This will be the first byte of the desired
CHPX/PAPX.
bin table
Each FKP can be viewed as bucket or bin that contains the properties of a certain range of FCs in the Word
file. In Word files, a PLC, the plcfbte (PLex of FCs containing Bin Table Entries) is maintained. It records
the association between a particular range of FCs and the PN (Page Number) of the FKP that contains the
plcfbtePapx which records the location of every PAPX FKP must be stored. In a non-complex, full-saved
document, all of the CHPX FKPs are recorded in consecutive 512-byte pages with the FKPs recorded in
ascending FC order, as are all of the PAPX FKPs. In a non-complex document, at least the first FKP page
number will be recorded so that the beginning of the consecutive range of pages may be located. However,
the bin table may be incomplete because of resource constraints placed on Word's save procedures.
If a plcfbte is incomplete, the page numbers of the first n FKPs will be recorded but the last m FKPs would
not be represented. The complete plcfbte may be reconstructed by the reader because the total number of
CHPX FKPs and PAPX FKPs is recorded in the FIB. When a reader notices that the number of entries in a
plcfbte is less than the number of FKP pages that was recorded in the FIB, the reader must locate the last
PN recorded in the plcfbte, call it pnLast. If the number of missing page entries is m, the reader would have
to read pages pnLast + 1 through pnLast + m and record the first fc stored in each of the tables plus the last
fc of page pnLast + 1 to produce a complete plcfbte.
SEP(SEction Properties)
The data structure describing the properties of a particular section.
SEPX(SEction Property EXceptions)
A data structure describing how the properties of a particular section differ from a Word-defined standard
SEP. As in the PAPX, the differences between the SEP for a section and the standard SEP are encoded as
list of sprms that describe how the standard SEP can be transformed into the section's SEP. By applying a
SEPX's sprms to the standard SEP, it is possible to reconstitute the SEP for that section.
The PLCFSED, a data structure stored in a Word file, records the locations of all SEPXs stored in a Word
file. The array of CPs in the plcfsed records the boundaries of sections in the Word document . The second
array in the plcf, an array of SEDs (SEction Descriptors), is in 1-to-1 correspondence to the array of CPs.
Each SED stores the beginning FC of the SEPX that records the properties for a section. If the FC stored in
a SED is -1, the section properties of the section are exactly equal to the standard section properties.
The SEP for a particular section may be constructed if a CP of a character in that section is known. First
search the array of CPs in the PLCSED for the index of the largest CP that is less than or equal to the CP of
the character. Use this index to locate the SED in the plcfsed which describes the section. The FC stored in
the SED is the offset from the beginning of the Word file at which the SEPX is stored. If the stored FC is
equal to 0xFFFFFFFF, then the SEP for the section is exactly equal to the standard SEP (see SEP structure
definition) Otherwise, read the SEPX into memory and create a copy of the standard SEP. Finally, apply
the sprms stored in the SEPX to the standard SEP to produce the SEP for a section.
DOP (DOcument Properties)
The data structure describing properties that apply to the document as a whole.
sub-document
A separate logical stream of text with properties for which correspondences with the main document text are
maintained. Word's headers/footers, footnotes, macro procedure text, and annotation text are kept in separate
subdocuments. Each subdocument has its own CP coordinate space. In other words, data structures are
stored in Word files that are components of these subdocuments. These data structures contain CP
coordinates whose 0 point is the beginning of the subdocument text stream instead of the beginning of the
main document text stream.
In full-saved documents, a simple calculation with values stored in the FIB produces the file offset of the
beginning of the subdocument text streams (if they exist). The length of these streams is also stored.
In fast-saved documents, the piece tables of subdocuments are concatenated to the end of the main
document piece table. In this case, to identify the beginning of subdocument text , you must sum the length
CP coordinate, to find the physical location of each piece of the subdocument text stream.
field
A field is a two-part structure that may be recorded in the CP stream of a document. The first part of the
structure contains field codes which instruct Window's Word to insert text into the second part of the
structure, the field result. Fields in Window's Word are used to insert text from an external file or to quote
another part of a document, to mark index and table of contents entries and produce indexes and tables of
contents, maintain DDE links to other programs, to produce dates, times, page numbers, sequence numbers,
etc. There are 56 different field types.
A field begin mark delimits the beginning of a field and precedes any of the field codes stored in the field.
The end of the field codes and the beginning of the field result is marked with the field separator and the
field result and the field itself are terminated by a field end mark.
The CP locations of the field begin mark, field separator, and field end mark are recorded in plcfld data
structures that are maintained for the main document and all of the subdocuments of the main document
whenever a field is inserted or edited. An array of two-byte FLD structures is stored in the plcfld in one-to-
one correspondence with the CP entries recorded. An FLD associated with a field begin mark records the
type of the field. An FLD associated with the field end mark records the current status of the field (i.e.
whether the result is dirty or has been edited, whether the result has been locked, etc.)
Fields may be nested. 20 levels of nesting are permitted.
bookmark
A bookmark associates a user definable name with a range of text within a document. A bookmark is
frequently used as an operand in field code instructions within a field. In Window's Word a bookmark is
represented by three parallel data structures, the sttbBkmk, the plcbkf and the plcbkl. The sttbBkmk is a
string table which contains the name of each bookmark that is defined. The plcbkf records the beginning CP
position of each bookmark. The plcbkl records the limit CP position that delimits the end of a bookmark.
Since bookmarks may be nested within one another to any level, the BKF structure stored in the plcbkf
consists of a single index which specifies which plcbkl marks the end of the bookmark. Similarly, the BKL
structure stored in the plcbkl consists of a single index which specifies which plcbkf marks the beginning of
the bookmark.
picture
A picture is represented in the document text stream as a special character, an ASCII 1 whose CHP has the
fSpec bit set to 1. The file location of the picture in the Word binary file is stored in the character’s CHP in
chp.fcPic. For Word for Windows, a picture may be a Window's metafile, a bitmap or a reference to a TIFF
file. Beginning at the position recorded in chp.fcPic, a header data structure, the PIC, will be stored. If the
picture is a Window's metafile or a bitmap, the metafile or bitmap will immediately follow the PIC. If the
picture is a TIFF file, the filename of the TIFF file will be recorded immediately following the PIC.
embedded object
The native data for Embedded objects (OBJs) is stored similarly to pictures (PICs). To locate the native data
for Embedded objects, scan the plc of field codes for the mother, header, footnote and annotation documents
(fib.PlcffldMom/Hdr/Ftn/Atn). For each separator field, get the CHP. If chp.fSpec = 1 and chp.fObj = 1,
then this separator field has an associated embedded object. The file location of the object data is stored in
chp.fcObj. At the specified location an object header is stored followed by the native data for the object.
See the OBJHEADER structure.
Note: In this document, bit 0 is the low-order bit. Structures are described as they would be declared in C for the
Intel architecture. When numbering bytes in a word from low offset towards high offset, two-byte integers will
have their least significant eight bits stored in byte 0 and most significant eight bits in byte 1. If bit 31 is the most
significant bit in a four-byte integer, bits 31 through 24 will be stored in byte 3 of a four-byte integer, bits 23
Naming Conventions
The names in Word data structures usually consist of a lower case sequence of characters followed by an optional upper
case modifier. The following tags are used in the lower case parts of field names to document the data type of a field:
f used to name a flag (a variable containing a Boolean value). Usually the object referred to will contain
either 1 (fTrue, TRUE) or 0 (fFalse, FALSE). (e.g. fWidowControl, fShadow)
l used to name a 4 byte integer value ( a long). (e.g. lcb)
w used to name a 2 byte integer value (a word).
b used to name a 1 byte integer value
cp used to name a variable that contains a character position within the document. always a 4 byte
quantity.
fc used to name a variable that contains an offset from the beginning of a file. always a 4 byte quantity.
xa used to name a variable that contains a width of an object imaged on screen or on hard copy that is
measured in units of 1/1440 of an inch. This unit which is one-twentieth of a point size (1/20 * 1/72”)
is called a twip in this documentation. (e.g. xaPage is the width of a page).
ya used to name a variable that contains a height of an object imaged on screen or on hard copy that is
measured in twips.
dxa used to name a variable that contains the horizontal distance of an object measured from some
reference point expressed in twips. (e.g. pap.dxaLeft is the distance of the left boundary of a paragraph
measured from the left margin of the page)
dya used to name a variable that contains the vertical distance of an object measured from some reference
point expressed in twips. (e.g. pap.dyaAbs is the vertical distance of the top of a paragraph from a
reference frame declared in the pap).
dxp used to name a variable that contains the horizontal distance of an object measured from some
reference point expressed in Macintosh pixel units (1/72”). (e.g. dxpSpace)
dyp used to name a variable that contains the vertical distance of an object measured from some reference
point expressed in Macintosh pixel units (1/72”).
rg prefix used to signify that the data structure being defined is an array. (eg.rgb (an array of bytes), rgcp
(an array of CPs), rgfc (an array of FCs), rgfoo (an array of foos).
i prefix used to signify that an integer value is used as an index into an array. (e.g. itbd is an index into
rgtbd, itc is an index into rgtc.)
c prefix used to signify that an integer value is a count of some number of objects. (e.g. a cb is a count of
bytes, a cl is a count of lines, ccol is a count of columns, a cpe is a count of picture elements.)
grp prefix used to name an array of bytes that contains one or more copies of a variable length data
structure with the instances of the data structure stored one after the other in the array. (e.g. a grpprl is
grpf prefix used to name an integer or byte value whose bits are used as flags. (e.g. grpfIhdt is a group of
flags that records the types of headers that are stored for a particular section of a document).
The two following modifiers are used occasionally in this documentation:
First means that variable marks the first of a range of objects. For example, cpFirst would mark the first
character position of a range of characters in a document. fcFirst would mark the file offset of the first
byte of a range of bytes stored in a file.
Lim means the variable marks the limit of a range of objects (i.e. is the index of the last object in a range
plus 1). For example, cpLim would be the limit CP of a range of characters in a document. fcLim
would be the limit file offset of a range of bytes stored in a file.
Non-Complex File Format
A Word binary file (non-complex format) consists of the Word file header (FIB), the text, and the formatting
information.
FIB
Stored at beginning of page 0 of the file. fib.fComplex will be set to zero.
text of body, footnotes, headers
Text begins at the position recorded in fib.fcMin.
group of SEPXs
SEPXs immediately follow the text and are concatenated one after the other. A SEPX may not span a 512-
byte page boundary. If a SEPX will not fit in the space that remains in a page from recording previous text
or SEPXs, space is skipped to allow the SEPX to start on a page boundary. A SEPX is guaranteed to be less
than 512 bytes in length. If all sections in the document have default properties, no SEPXs would be stored.
pictures
Word picture structures immediately follow the preceding text/SEPXs and are concatenated one after the
other if the document contains pictures.
embedded objects-native data
Word embedded object structures immediately follow the preceding text/SEPXs/picture and are
concatenated one after the other if the document contains embedded objects.
FKPs for CHPs
The first CHP FKP begins at the first 512-byte boundary after the last byte of text/SEP/picture/embedded
objects written. The remaining CHP FKPs are recorded in the 512-byte pages that immediately follow.
FKPs for PAPs
The first PAP FKP is written in the 512-byte page that immediately follows the page used to record the last
CHP FKP. The remaining PAP FKPs are recorded in the 512-byte pages that follow.
stsh (style sheet)
The style sheet is written at the beginning of the 512-byte page that immediately follows the last PAP
FKP. This is recorded in all Word for Windows documents.
plcffndRef (footnote reference position table)
Written immediately after the stsh if the document contains footnotes.
plcffndTxt (footnote text position table)
Written immediately after the plcffndRef.if the document contains footnotes.
plcfandRef (annotation reference position table)
Written immediately after the plcffndTxt if the document contains annotations.
plcfandTxt (annotation text position table)
Written immediately after the plcfandRef.if the document contains footnotes.
plcfsed (section table)
Written immediately after the plcfsed, if paragraph heights have been recorded.
plcfpgd (page table)
Written immediately after the previously recorded table, if page boundary information is recorded.
sttbGlsy (glossary name string table)
Written immediately after the previously recorded table, if the document stored is a glossary.
plcfglsy (glossary entry text position table)
Written immediately after the sttbGlsy, if the document stored is a glossary.
plcfhdd (header text position table)
Written immediately after the previously recorded table, if the document contains headers or footers.
plcfbteChpx (bin table for CHP FKPs)
Written immediately after the previously recorded table. This is recorded in all Word for Windows
documents.
plcfbtePapx (bin table for PAP FKPs)
Written immediately after the plcfbteChpx. This is recorded in all Word for Windows documents.
sttbfFn (table of font name strings)
Written immediately after the plcfbtePapx. This is recorded in all Word for Windows documents. The
names of the fonts correspond to the FTC codes in the CHP structure. For example, the first font name
listed corresponds is the name for ftc = 01.
plcffldMom(table of field positions and statuses for main document)
Written immediately after the sttbfFn if the main document contains fields.
plcffldHdr(table of field positions and statuses for header subdocument)
Written immediately after the previously recorded table, if the header subdocument contains fields.
plcffldFtn(table of field positions and statuses for footnote subdocument)
Written immediately after the previously recorded table, if the footnote subdocument contains fields.
plcffldAtn(table of field positions and statuses for annotation subdocument)
Written immediately after the previously recorded table, if the annotation subdocument contains fields.
plcffldMcr(table of field positions and statuses for macro subdocument)
Written immediately after the previously recorded table, if the macro subdocument contains fields.
sttbfBkmk(table of bookmark name strings)
Written immediately after the previously recorded table, if the document contains bookmarks.
plcfBkmkf(table recording beginning CPs of bookmarks)
Written immediately after the sttbfBkmk, if the document contains bookmarks.
plcfBkmkl(table recording limit CPs of bookmarks)
Written immediately after the plcfBkmkf, if the document contains bookmarks.
cmds (recording of command data structures)
Written immediately after the previously recorded table, if special commands are linked to this document.
plcfmcr (macro text position table -- delimits boundaries of text for macros stored in macro
subdocument)
Written immediately after the previously recorded table, if a macro subdocument is recorded.
sttbfMcr (table of macro name strings)
Written immediately after the plcfmcr, if a macro subdocument is recorded.
PrEnv (data structures recording the print environment for document)
Written immediately after the previously recorded table, if a print environment is recorded for the
document.
wss (window state structure)
Written immediately after the end of previously recorded structure, if the document was saved while a
window was open.
1In the Winword 1.x format, the names of the first three fonts were omitted from the table and assumed to be "Tms
Rmn" (for ftc = 0), "Symbol", and "Helv". In WinWord 2.0, the names for all fonts are included explitly in the table. It
is still true that ftc = 0 represents the "best" Roman PS font on the system, ftc = 1 represents the Symbol font, and ftc = 2
Windows documents.
sttbfAssoc(table of associated strings)
Autosave source(name of original)
Written immediately after the sttbfAssoc table. This field only appears in autosave files. These files are
normal Word for Windows document in every other way. Also, autosaved files are typically in the
complex file format except that we don't overwrite the tables (PLCF*, etc.). I.e., an autosaved file is
typically longer than the equivalent Word for Windows document.
Complex File Format
A Word binary file (complex format) consists of the Word file header (FIB), the text, and the formatting information.
FIB
Text of body, footnotes, headers stored during last full save
Text begins at the position recorded in fib.fcMin.
Group of SEPXs stored during last full save
Pictures stored during last full save
Embedded Object stored during last full save
FKPs for CHPs during last full save
The first CHP FKP begins at the first 512-byte boundary after the last byte of text/SEP/picture/embedded
object written. The remaining CHP FKPs are recorded in the 512-byte pages that immediately follow.
FKPs for PAPs during last full save
The first PAP FKP is written in the 512-byte page that immediately follows the page used to record the last
CHP FKP. The remaining PAP FKPs are recorded in the 512-byte pages that follow.
STSH (if style sheet has not grown since last full save)
Any text, SEPXs, pictures or embedded objects stored during first fast save
Any CHP FKPs stored during first full save
Any PAP FKPs stored during first full save
Any text, SEPXs, pictures or embedded objects stored during second fast save
Any CHP FKPs stored during second full save
Any PAP FKPs stored during second full save
...
Any text, SEPXs, pictures or embedded objects stored during nth fast save
Any CHP FKPs stored during nth full save
Any PAP FKPs stored during nth full save
stsh (if style sheet has grown since last full save)
plcffndRef (footnote reference position table)
Written immediately after the stsh if the document contains footnotes.
plcffndTxt (footnote text position table)
Written immediately after the plcffndRef.if the document contains footnotes.
plcfandRef (annotation reference position table)
Written immediately after the plcffndTxt if the document contains annotations.
plcfandTxt (annotation text position table)
Written immediately after the plcfandRef.if the document contains footnotes.
plcfsed (section table)
Written immediately after the previously recorded table. Recorded in all Word for Windows documents.
plcfpgd (page table)
Written immediately after the previously recorded table, if page boundary information is recorded.
sttbGlsy (glossary name string table)
Written immediately after the previously recorded table, if the document stored is a glossary.
plcfglsy (glossary entry text position table)
Written immediately after the sttbGlsy, if the document stored is a glossary.
plcfhdd (header text position table)
Written immediately after the previously recorded table, if the document contains headers or footers.
plcfbteChpx (bin table for CHP FKPs)
Written immediately after the previously recorded table. This is recorded in all Word for Windows
documents.
plcfbtePapx (bin table for PAP FKPs)
Written immediately after the plcfbteChpx. This is recorded in all Word for Windows documents.
sttbfFn (table of font name strings)
Written immediately after the plcfbtePapx. This is recorded in all Word for Windows documents. The
names of the fonts correspond to the FTC codes in the CHP structure. For example, the first font name
listed corresponds is the name for ftc = 01 .
plcffldMom(table of field positions and statuses for main document)
Written immediately after the sttbfFn if the main document contains fields.
plcffldHdr(table of field positions and statuses for header subdocument)
Written immediately after the previously recorded table, if the header subdocument contains fields.
plcffldFtn(table of field positions and statuses for footnote subdocument)
Written immediately after the previously recorded table, if the footnote subdocument contains fields.
plcffldAtn(table of field positions and statuses for annotation subdocument)
Written immediately after the previously recorded table, if the annotation subdocument contains fields.
plcffldMcr(table of field positions and statuses for macro subdocument)
Written immediately after the previously recorded table, if the macro subdocument contains fields.
sttbfBkmk(table of bookmark name strings)
Written immediately after the previously recorded table, if the document contains bookmarks.
plcfBkmkf(table recording beginning CPs of bookmarks)
Written immediately after the sttbfBkmk, if the document contains bookmarks.
plcfBkmkl(table recording limit CPs of bookmarks)
Written immediately after the plcfBkmkf, if the document contains bookmarks.
cmds (recording of command data structures)
Written immediately after the previously recorded table, if special commands are linked to this document.
plcfmcr (macro text position table -- delimits boundaries of text for macros stored in macro
subdocument)
Written immediately after the previously recorded table, if a macro subdocument is recorded.
sttbfMcr (table of macro name strings)
Written immediately after the plcfmcr, if a macro subdocument is recorded.
PrEnv (data structures recording the print environment for document)
Written immediately after the previously recorded table, if a print environment is recorded for the
document.
wss (window state structure)
Written immediately after the end of previously recorded structure, if the document was saved while a
window was open.
1 In the Winword 1.x format, the names of the first three fonts were omitted from the table and assumed to be "Tms
Rmn" (for ftc = 0), "Symbol", and "Helv". In WinWord 2.0, the names for all fonts are included explitly in the table. It
is still true that ftc = 0 represents the "best" Roman PS font on the system, ftc = 1 represents the Symbol font, and ftc = 2
represents the "best" Swiss (Sans Serif) PS font available.
for Windows documents.
dop (document properties record)
Written immediately after the end of previously recorded structure.. This is recorded in all Word for
Windows documents.
sttbfAssoc(table of associated strings)
Autosave source (documented above)
File Information Block (FIB)
The FIB contains a "magic word" and pointers to the various other parts of the file, as well as information about the
length of the file. The FIB starts at the beginning of the file and fits within the first page of the file. The FIB is defined
in the structure definition section of this document.
Text
The text of the file starts at fib.fcMin. fib.fcMin is usually set to the next 128 byte boundary after the end of the FIB. The
text in a Word document is ASCII text with the following restrictions (ASCII codes given in decimal):
- Paragraph ends (or line ends in unformatted files) are stored as <Carriage Return, Line Feed> (ASCII 13,
ASCII 10). No other occurrences of this character sequence are allowed.
- Hard line breaks which are not paragraph ends are stored as ASCII 11. Other line break or word wrap
information is not stored.
- Breaking hyphens are stored as ASCII 45 (normal hyphen code); Non-required hyphens are ASCII 31.
Non-breaking hyphens are stored as ASCII 30.
- Non-breaking spaces are stored as 160. Normal spaces are ASCII 32.
- Page breaks and Section marks are ASCII 12 (normal form feed); if there's an entry in the section table,
it's a section mark, otherwise it's a page break.
- Column breaks are stored as ASCII 14.
- Tab characters are ASCII 9 (normal).
- The field begin mark which delimits the beginning of a field is ASCII 19. The field end mark which
delimits the end of a field is ASCII 21. The field separator ,which marks the boundary between the
preceding field code text and following field expansion text within a field, is ASCII 20. The field escape
character is the '\' character which also serves as the formula mark.
- The cell mark which delimits the end of a cell in a table row is stored as ASCII 7 and has the fInTable
paragraph property set to fTrue (pap.fInTable == 1).
- The row mark which delimits the end of a table row is stored as ASCII 7 and has the fInTable paragraph
property and fTtp paragraph property set to fTrue (pap.fInTable == 1 && pap.fTtp == 1).
The following ASCII codes are treated as "special" characters when they have the character property special on
(chp.fSpec == 1):
1 Picture
2 Autonumbered footnote reference.
3 Footnote separator character
4 Footnote continuation character
5 Annotation reference
6 Hand Annotation (Generated in Pen Windows)
Note: The end of a section is also the end of a paragraph. The last character of a section is a section mark which stands
in place of the paragraph mark normally required to end a paragraph. An exception is made for the last character of a
If !fib.fComplex, the document text stream is represented by the text beginning at fib.fcMin up to (but not including)
fib.fcMac. Otherwise, the document is represented by the piece table stored in the file in the data beginning at .fib.fcClx.
The document text stream includes text that is part of the main document, plus any text that exists for the footnote,
header, macro, or annotation subdocuments. The sizes of the main document and the header, footnote, macro and
annotation subdocuments are stored in the fib, in variables fib.ccpText, fib.ccpFtn, fib.ccpHdr, fib.ccpMcr, and
fib.ccpAtn respectively. In a non-complex file, this means that the text of the main document begins at fib.fcMin in the
file and continues through fib.fcMin + fib.ccpText; that the text of the footnote subdocument begins at fib.fcMin +
fib.ccpText and extends to fib.fcMin + fib.ccpText + fib.ccpFtn; that the text of the header subdocument begins at
fib.fcMin + fib.ccpText + fib.ccpFtn and extends to fib.fcMin + fib.ccpText + fib.ccpFtn + fib.ccpHdr; that the text of
the macro subdocument begins at .fib.fcMin + fib.ccpText + fib.ccpFtn + fib.ccpHdr and extends to fib.fcMin +
fib.ccpText + fib.ccpFtn + fib.ccpHdr + ccpMcr; and that the text of the annotation subdocument begins at .fib.fcMin +
fib.ccpText + fib.ccpFtn + fib.ccpHdr + ccpMcr and extends to fib.fcMin + fib.ccpText + fib.ccpFtn + fib.ccpHdr +
ccpMcr + ccpAtn.
In a complex, fast-saved file, the main document text must be located by examining the piece table entries from the 0th
piece table entry through the piece table entry that describes cp = fib.ccpText.
A footnote subdocument's text must be located by examining the piece table entries beginning with the one that describes
cp = fib.ccpText through the entry that describes cp = fib.ccpText + fib.ccpFtn.
A header subdocument's text must be located by examining the piece table entries beginning with the one that describes
cp = fib.ccpText + ccpFtn through the entry that describes cp = fib.ccpText + fib.ccpFtn + fib.ccpHdr.
A macro subdocument's text must be located by examining the piece table entries beginning with the one that describes
cp = fib.ccpText + ccpFtn + fib.ccpHdr through the entry that describes cp = fib.ccpText + fib.ccPFtn + fib.ccpHdr +
fib.ccpMcr.
An annotation subdocument's text must be located by examining the piece table entries beginning with the one that
describes cp = fib.ccpText + ccpFtn + fib.ccpHdr + fib.ccpMcr through the entry that describes cp = fib.ccpText +
fib.ccPFtn + fib.ccpHdr + fib.ccpMcr + ccpAtn.
Character and Paragraph Formatting Properties
Character and paragraph properties in Word documents are stored in a compressed format. The information that is stored
on disk is not the actual properties of a particular sequence of text but the difference of the properties of a sequence from
some reference property.
The PAP is a data structure that holds uncompressed paragraph property information; the CHP (pronounced like "chip")
is a structure that holds uncompressed character property information .Each paragraph in a Word document inherits a
default set of paragraph and character properties from one of the styles recorded in the style sheet data structure (STSH).
A particular PAP is converted into its compressed form, the PAPX, by first comparing the pap for a paragraph with the
pap stored in the style sheet for the paragraph's style. Any properties in the paragraph's PAP that are different from those
stored in the style sheet PAP are encoded as a list of sprms (grpprl). sprms express how the content of the style sheet
PAP should be transformed to create the properties for the paragraph. A PAPX is a variable-length data structure that
begins with a count of words that encodes the PAPX length. It contains a STC (style code) which specifies which style
entry in the style sheet contains the default paragraph and character properties for the paragraph, paragraph height
information, and the list of difference sprms. If the only difference between the paragraph's PAP and the style's PAP
were in the justification code field, which is one byte long, one two-byte sprm, sprmPJc, would be generated to express
that difference; thus the total PAPX size would be 9 bytes. This is better than 23-1 compression since the total size of a
PAP is 210 bytes.
To convert a CHP for a sequence of characters contained within a single paragraph into its compressed form, the
CHPX, first clear a local instance of a CHP to zeros. (A CHPX has the same form as a CHP, but is interpreted with a
different algorithm.) This local instance will be transformed into the CHPX. In the CHP are a set of properties encoded
as single bits (e.g. fBold, fItalic), and another set of properties encoded as multi-bit, byte or word fields (FTC,
HPS).When there is a difference in one of the single-bit properties between the character sequence property and the style
property, the bit for that property is set to 1 in the local instance. Any single-bit properties that are the same in the two
versions will be left 0 in the local instance. The idea is that when a CHPX is interpreted, all of the single-bit properties in
the CHPX will be xor-ed with the single-bit properties in the style's character properties. This will produce a CHP that
has the same single bit settings as the character property of the character sequence. Each of the multi-bit, byte, and word
fields in a CHP are assigned a bit in the first 16 bits of the CHPX which, when true, will mean that there is a difference
in the corresponding non-bit field.
For example the font code field in the CHP (chp.ftc) is assigned a bit in the first 16 bits called chp.fsFtc. When a
difference is detected in a non-bit field, the difference bit in the first 16 bits of the CHP that corresponds to that property
is set to 1, and the contents of the field in character sequence's CHP is copied to the local instance. If one of the non-bit
fields is unchanged from its setting in the style's CHP the equivalent value stored in the CHPX will be 0. Since only the
non-zero prefix of a CHPX is recorded in Word files, fairly good compression can be achieved. For example, if the
character sequence CHP was changing the boldness of the style character property and was changing the font code to 0,
a difference bit for boldness would be recorded in the 0th byte of the CHPX and a difference bit for the font code would
be set in the 1st byte of the CHPX. In this case the CHPX would consist of a 1-byte length code plus a 2-byte non-zero
prefix. This would be 4-1 compression.
If a sequence of characters has the same character properties and the sequence spans more than one paragraph, it's
necessary to examine each paragraph's properties and to generate a different CHPX every time there is a change of style.
In Word documents, the fundamental unit of text for which character exception information is kept is the run of
exception text, a contiguous sequence of characters stored on disk that all have the same exception properties with
respect to their underlying style character properties. Each run would have an entry recorded in a CHPX FKP. If a user
never changed the character properties inherited from the styles used in his document and did a complete save of his
document, although each of those styles may have different properties, the entire document stream would be one large
run of exception text and one CHPX would suffice to describe the character properties of the entire document.
The fundamental unit of text for which paragraph properties are recorded is the paragraph. Every paragraph has an
entry recorded in a PAPX FKP.
The CHPX FKP and the PAPX FKP have the same physical structure. An FKP is a 512-byte data structure that is
stored in one page of a Word file. At offset 511 is a 1-byte count named crun, which is a count of runs of exception text
for CHPX FKPs and which is a count of paragraphs in PAPX FKPs. Beginning at offset 0 of the FKP is an array of
crun + 1 FCs, named rgfc, which records the beginning and limit FCs of crun runs of exception text or paragraphs.
Immediately following rgfc is a byte array of crun word offsets to CHPXs or PAPXs from the beginning of the FKP.
This byte array, named rgb, is in 1-to-1 correspondence with the rgfc. The ith rgb gives the word offset of the
exception property that belongs to the run/paragraph whose beginning in FC space is rgfc[i] and whose limit is
rgfc[i+1] in FC space.
The fact that the value stored in rgb is a word offset implies that CHPXs and PAPXs are stored in FKPs beginning on
word boundaries. Since the values stored in the rgb allow random access throughout the FKP, space within an FKP can
be conserved by storing the offset of the same physical CHPX/PAPX in rgb entries when several runs or paragraphs in
the FKP have the same properties. Word uses this optimization.
An rgb value of 0 is used in another optimization. When a rgb value of 0 is stored in an FKP, it means that instead of
referring to a particular CHPX/PAPX in the FKP the 0 value is a signal that the reader should construct for itself a
commonly encountered predefined set of properties.
pixels, with a column width of 7980 dxas.
When new entries are added to an FKP, there must be unallocated space in the middle of the FKP equal to 5 bytes(size
of an FC plus size of one-byte word offset), plus the size of the new CHPX or PAPX if the property being added is not
already recorded in the FKP and is not the property coded with a 0 rgb value. To add a new property, existing rgb
entries are moved four bytes to the right in the FKP. The new FC is added at the end of the rgfc. The new CHPX or
PAPX is recorded on a 2-byte boundary before the previously recorded properties stored at the end of the block. The
word offset of the beginning of the CHPX or PAPX is stored as the last entry of the relocated rgb, and finally, the crun
stored at offset 511 is incremented.
Bin Tables
A bin table (plcfbte) partitions the total extent of the Word file that contains text characters into a set of contiguous
intervals marked by a fcFirst and an fcLim. The fcFirst for the nth interval would be plcfbte.rgfc[n] and the fcLim for
the nth interval would be plcfbte.rgfc[n+1]. Associated with each interval is a BTE. A BTE holds a two-byte PN (page
number) which identifies the FKP page in the file which contains the formatting information for that interval. A CHPX
FKP further partitions an interval into runs of exception text. A PAPX FKP in a non-complex, full-saved file, partitions
the text within intervals into paragraphs. If a file is in complex format (has been fast-saved), the PAPX FKP only
records the FCs within the text that are preceded by a paragraph mark. Even though a sequence of text may be physically
located between two paragraph end marks, it may reside in a paragraph different from the one defined by the following
paragraph end mark, because the text may have been moved by the user into a different paragraph. In the logical text
stream represented by the document's piece table, the paragraph mark that follows the moved text is stored in a non-
adjacent physical location in the file.
Style Sheets
The style sheet establishes a correspondence between a style code (a number in the range 0-255) and a style, a set of
paragraph and character formatting information. When a particular style code is recorded in the pap.stc, the paragraph
properties corresponding to the style code are used as the default paragraph properties for the paragraph and the character
properties corresponding to the style code are used as the default character properties for all paragraph characters. Any
property differences from these "defaults" are recorded in the CHPXs and PAPXs discussed previously.
A style is always based on another style, with the differences being stored as CHPXs and PAPXs in the style sheet.
There can be a chain of "based on" styles up to 10 deep; that is, a style can have up to 9 "ancestors" (not including itself).
Eventually, every chain must end eventually at the null style (style code 222). The null style, not based on any other
style, has the standard character and paragraph properties. The standard CHP has chp.hps = 20 with all other fields set to
zero. The standard PAP has all fields set to zero. To arrive at the final CHP and PAP of a paragraph, the CHPXs and
PAPXs in the based-on chain must all be applied, starting at the null style.
The style sheet is stored in the file in the following format:
Field Size Comment
cstcStd 2 bytes count of standard STCs used in this document
sttbName:
cbName 2 bytes count of bytes in sttbName (including cb)
grpst cbName - 2 group of style names, stored as st's (string preceded by length
byte)
sttbChpx:
character properties of the base style whose style code is
recorded in the plestcp
sttbPapx:
cbPapx 2 bytes cb of sttbPapx (inclusive)
grpst cbPapx - 2 group of PAPXs, where PAPXs are modifications from the
paragraph properties of the base style whose style code is
recorded in the plfestcp
plfestcp:
iMac 2 bytes count of entries in dnstcp
dnstcp 2 * iMac bytes array of estcp's (see below)
The three sttbs and the plestcp are parallel structures, indexed by stcps. Using stcps instead of STCs to index the sttbs
minimizes the amount of space taken by undefined styles in the style sheet . A "defined" style is one whose sttbName
entry is not 255 (see below). A style needs to be defined if it is either referenced explicitly in the document or if a style
which is referenced is based on it. The stcps are derived as follows:
The styles must be stored sequentially in the style sheet. Style codes in the range 1-221 are reserved for user-defined
styles (not all of which are necessarily defined for a particular document) and those in the range 222-255 and 0 are
reserved for standard styles (again, not all of which are necessarily defined). If we start with the minimum defined
standard style, list all the rest of the standard styles, followed by the user-defined ones, up to the maximum one that is
defined, we have a sequential list which is of minimal size that encompasses all of the defined styles. The stcps are
simply the indices of this transformed list of STCs. There is a 1-to-1 mapping from STCs to stcps and back:
stcp = (stc + cstcStd) & 255
stc = (stcp - cstcStd) & 255
The dnstcp is an array which can be indexed by the stcp derived from the stc. The grpst's in the sttbs are constructed
from a number of variable length st's (or CHPXs or PAPXs) run together. Since the length of an st is encoded right in the
st, the nth st can be extracted from a grpst by scanning sequentially from the beginning of the grpst.
An estcp is a 2-byte structure composed of two 1-byte STCs:
stcNext When the user breaks a paragraph having the current style by inserting a paragraph mark, the
newly created paragraph will be given the default properties of style stcNext (must be a
defined style); the default stcNext for an stc is itself
stcBase the style that this style is based on (must be a defined style)
The st's in the grpst's above can have special meanings. In sttbName, if an st is 255 the style is not defined (in which
case the CHPX and PAPX should also be 255). In the sttbName, if the length byte of the st is 0, it is a standard style
which has the internal built-in name (no alternates added). If there are any alternate names for a style, they are appended
together and separated by commas. The internal names for standard styles may not be changed (though alternates may
be appended) or used as alternates for any other style. They are eternally wed to the standard style.
If the length byte of the st is 0 in the sttbChpx or the sttbPapx, there are no differences from the properties of the style it
is based on. If the length byte of the name is not 255, but the length byte of the CHPX or PAPX is 255 (if either one is,
the other one should also be), then it is a standard style whose character or paragraph properties (respectively) have not
changed from the internal built-in styles (described below). For user-defined styles, the name must be at least one non-
space character.
STCs 222 through 255 and 0 are reserved for "standard" styles, e.g. styles for headers, footers, page numbers, etc. (this
leaves 1-221 for user-defined styles). The "normal" style is the style used as a default for all paragraphs in a document
that don't have a style. The default (built-in) property exceptions (differences from "based-on" style) for standard styles
0 Normal standard PAP(standard PAP has all fields cleared to 0), standard CHP (
chp.hps = 20, all other fields set to 0).
255 Normal indent pap.dxaLeft = 720.
/* Heading levels */
254 heading 1 pap.dyaBefore = 240 (12 points), chp.fBold = negation of Normal style's
chp.fBold, chp.kul = 1 (single underline), chp.hps = 24, chp.ftc = 2 .
253 heading 2 pap.dyaBefore = 120 (6 points), chp.fBold = negation of Normal style's
chp.fBold, chp.hps = 24, chp.ftc = 2
252 heading 3 pap.dxaLeft = 360, chp.fBold = negation of Normal style's chp.fBold,
chp.hps = 24;
251 heading 4 pap.dxaLeft = 360, chp.kul = 1 (single underline), chp.hps = 24;
250 heading 5 pap.dxaLeft = 720, chp.fBold = negation of Normal style's chp.fBold,
chp.hps = 20;
249 heading 6 pap.dxaLeft = 720, chp.kul = 1 (single underline), chp.hps = 20;
248 heading 7 pap.dxaLeft = 720, chp.fItalic = negation of Normal style's chp.fItalic,
chp.hps = 20;
247 heading 8 pap.dxaLeft = 720, chp.fItalic = negation of Normal style's chp.fItalic,
chp.hps = 20;
246 heading 9 pap.dxaLeft = 720, chp.fItalic = negation of Normal style's chp.fItalic,
chp.hps = 20;
245 footnote text chp.hps = 20
244 footnote reference chp.hps = 16; hpsPos = 6
243 header When running a U.S. system file:
pap.itbdMac = 2, pap.rgdxaTab[0] = 3 * 1440, pap.rgtbd[0].jc = 1,
pap.rgtbd[0].tlc = 0, pap.rgdxaTab[1] = 6* 1440, pap.rgtbd[1].jc = 1,
pap.rgtbd[1].tlc = 0;
When running an International metric system:
pap.itbdMac = 2, pap.rgdxaTab[0] =3969, pap.rgtbd[0].jc = 1,
pap.rgtbd[0].tlc = 0, pap.rgdxaTab[1] = 8504, pap.rgtbd[1].jc = 1,
pap.rgtbd[1].tlc = 0;
242 footer When running a U.S. system file:
pap.itbdMac = 2, pap.rgdxaTab[0] = 3 * 1440, pap.rgtbd[0].jc = 1,
pap.rgtbd[0].tlc = 0, pap.rgdxaTab[1] = 6* 1440, pap.rgtbd[1].jc = 1,
pap.rgtbd[1].tlc = 0;
When running an International metric system:
pap.itbdMac = 2, pap.rgdxaTab[0] =3969, pap.rgtbd[0].jc = 1,
pap.rgtbd[0].tlc = 0, pap.rgdxaTab[1] = 8504, pap.rgtbd[1].jc = 1,
pap.rgtbd[1].tlc = 0;
241 index heading same as properties for Normal style (stc == 0)
240 line number same as properties for Normal style (stc == 0)
/* Index entries: */
When running on U.S. system file
all have pap.dxaLeft = (index level number- 1) *360
When running on an International metric system
all have pap.dxaLeft = (index level number- 1) *283
239 index 1
238 index 2
237 index 3
233 index 7
/* Table of Contents entries: */
When running on U.S. system file
pap.itbdMac = 2, pap.rgdxaTab[0] = 8280, pap.rgtbd[0].jc = 0, pap.rgtbd[0].tlc = 1, pap.rgdxaTab[1] =
8640, pap.rgtbd[1].jc = 2, pap.rgtbd[1].tlc = 0;
pap.dxaRight =720
pap.dxaLeft = (table of contents level number - 1) * 720
When running an International metric system:
pap.itbdMac = 1, pap.rgdxaTab[0] = 8280, pap.rgtbd[0].jc = 2, pap.rgtbd[0].tlc = 1,
pap.dxaRight =850
pap.dxaLeft = (table of contents level number - 1) * 2835
232 toc 1
231 toc 2
230 toc 3
229 toc 4
228 toc 5
227 toc 6
226 toc 7
225 toc 8
224 annotation text chp.hps = 20
223 annotation reference chp.hps = 16
223 /* reserved */
222 Null stc (no name) all pap fields = 0, standard character props (chp.ftc = 2, chp.hps =
24);
Even if a document has no style sheet, the minimal STSH that must be written to the file takes up 17 bytes and is:
cstcStd = 0
/* one entry in each of the sttbs: */
sttbName:
cb = 3 (includes itself and the "\0")
0 (empty string)
sttbChpx:
cb = 8
CHPX:
cb = 5
/* all zeros except: */
fsHps = True
hps = 20 (decimal)
sttbPapx:
cb = 6
PAPX:
cb = 3
stc = 0
paph = 0
/* one entry in the plfestcp: */
iMac = 1
SPRM Definitions
A sprm is an instruction to modify one or more properties within one of the property defining data structures (CHP,
PAP, TAP, SEP, or PIC). A sprm always begins with a one byte opcode at offset 0 which identifies the operation to be
performed. If necessary information for the operation can always be expressed with a fixed length parameter, the fixed
length parameter is recorded immediately after the opcode beginning at offset 1. The length of a fixed length sprm is
always 1 plus the size of the sprms parameter. If the parameter for the sprm is variable length, the count of bytes of the
following parameter is stored in the byte at offset 1.
Two sprms, sprmPChgTabs and sprmTDefTable, can be longer than 256 bytes. The method for calculating the length of
sprmPChgTabs is recorded below with the description of the sprm. For sprmTDefTable, the length of the parameter plus
1 is recorded in the two bytes beginning at offset 1.
For variable length sprms, the total length of the sprm is the count recorded at offset 1 plus two. The parameter
immediately follows the count.
Unless otherwise noted, when a sprm is applied to a property the sprms parameter changes the old value of the property
in question to the value stored in the sprm parameter.
Name op code Property Modified Parameter Parameter size
sprmPStc 2 pap.stc stc (style code) byte
sprmPStcPermute 3 pap.stc permutation vector variable length
(see below)
sprmPIncLevel 4 pap.stc difference between byte
stc of base PAP and
stc of PAP to be
produced (see
below)
sprmPJc 5 pap.jc jc (justification) byte
sprmPFSideBySide 6 pap.fSideBySide 0 or 1 byte
sprmPFKeep 7 pap.fKeep 0 or 1 byte
sprmPFKeepFollow 8 pap.fKeepFollow 0 or 1 byte
sprmPPageBreakBefore 9 pap.fPageBreakBefore 0 or 1 byte
sprmPBrcl 10 pap.brcl brcl byte
sprmPBrcp 11 pap.brcp brcp byte
sprmPNfcSeqNumb 12 pap.nfcSeqNumb nfc byte
sprmPNoSeqNumb 13 pap.nnSeqNumb nn byte
sprmPFNoLineNumb 14 pap.fNoLnn 0 or 1 byte
sprmPChgTabsPapx 15 pap.itbdMac, pap.rgdxaTab, complex - see variable length
pap.rgtbd below
sprmPDxaRight 16 pap.dxaRight dxa word
sprmPDxaLeft 17 pap.dxaLeft dxa word
sprmPNest 18 pap.dxaLeft dxa-see below word
sprmPDxaLeft1 19 pap.dxaLeft1 dxa word
sprmPDyaLine 20 pap.dyaLine dya word
sprmPDyaBefore 21 pap.dyaBefore dya word
sprmPDyaAfter 22 pap.dyaAfter dya word
sprmPChgTabs 23 pap.itbdMac, pap.rgdxaTab, complex - see variable length
pap.rgtbd below
sprmPFInTable 24 pap.fInTable 0 or 1 byte
sprmPDyaAbs 27 pap.dyaAbs dya word
sprmPDxaWidth 28 pap.dxaWidth dxa word
sprmPPc 29 pap.pcHorz, pap.pcVert complex - see byte
below
sprmPBrcTop10 30 pap.brcTop BRC10 word
sprmPBrcLeft10 31 pap.brcLeft BRC10 word
sprmPBrcBottom10 32 pap.brcBottom BRC10 word
sprmPBrcRight10 33 pap.brcRight BRC10 word
sprmPBrcBetween10 34 pap.brcBetween BRC10 word
sprmPBrcBar10 35 pap.brcBar BRC10 word
sprmPFromText10 36 pap.dxaFromText dxa word
sprmPBrcTop 38 pap.brcTop BRC word
sprmPBrcLeft 39 pap.brcLeft BRC word
sprmPBrcBottom 40 pap.brcBottom BRC word
sprmPBrcRight 41 pap.brcRight BRC word
sprmPBrcBetween 42 pap.brcBetween BRC word
sprmPBrcBar 43 pap.brcBar BRC word
sprmPWHeightAbs 45 pap.wHeightAbs w word
sprmPShd 47 pap.shd SHD word
sprmPDyaFromText 48 pap.dyaFromText dya word
sprmPDxaFromText 49 pap.dxaFromText dxa word
sprmPFBiDi 50 pap.fBiDi
sprmCFStrikeRM 53 chp.fRMarkDel 1 or 0 bit
sprmCFRMark 54 chp.fRMark 1 or 0 bit
sprmCFFldVanish 55 chp.fFldVanish 1 or 0 bit
sprmCDefault 57 whole CHP (see below) none variable length
sprmCPlain 58 whole CHP (see below) none 0
sprmCFBold 60 chp.fBold 0,1, 128, or 129 byte
(see below)
sprmCFItalic 61 chp.fItalic 0,1, 128, or 129 byte
(see below)
sprmCFStrike 62 chp.fStrike 0,1, 128, or 129 byte
(see below)
sprmCFOutline 63 chp.fOutline 0,1, 128, or 129 byte
(see below)
sprmCFShadow 64 chp.fShadow 0,1, 128, or 129 byte
(see below)
sprmCFSmallCaps 65 chp.fSmallCaps 0,1, 128, or 129 byte
(see below)
sprmCFCaps 66 chp.fCaps 0,1, 128, or 129 byte
(see below)
sprmCFVanish 67 chp.fVanish 0,1, 128, or 129 byte
(see below)
sprmCFtc 68 chp.ftc ftc word
sprmCKul 69 chp.kul kul byte
sprmCSizePos 70 chp.hps, chp.hpsPos (see below) 3 bytes
sprmCQpsSpace 71 chp.qpsSpace qps word
sprmCLid 72 chp.lid LID word
sprmCIco 73 chp.ico ico byte
sprmCHpsPos 76 chp.hpsPos hps byte
sprmCHpsPosAdj 77 chp.hpsPos hps (see below) byte
sprmCMajority 78 whole CHP complex (see length byte
below) plus 8 bytes
sprmCFBoldBi 80 chp.fBoldBi 0, 1, 128 or 129 byte
(see below)
sprmCFItalicBi 81 chp.fItalicBi 0, 1, 128 or 129 byte
(see below)
sprmCFtcBi 82 chp.ftcBi ftcBi word
sprmClidBi 83 chp.lidBi LID word
sprmCIcoBi 84 chp.icoBi ico byte
sprmCHpsBi 85 chp.hpsBi hps byte
sprmCFBiDi 86 chp.fBiDi 0, 1, 128 or 129 byte
(see below)
sprmCFDiacColor 87 chp.fDiacUSico 0, 1, 128 or 129 byte
(see below)
sprmPicBrcl 94 pic.brcl brcl (see PIC byte
structure definition)
sprmPicScale 95 pic.mx, pic.my, complex (see length byte
pic.dxaCropleft, below) plus 12 bytes
pic.dyaCropTop
pic.dxaCropRight,
pic.dyaCropBottom
sprmPicBrcTop 96 pic.brcTop BRC word
sprmPicBrcLeft 97 pic.brcLeft BRC word
sprmPicBrcBottom 98 pic.brcBottom BRC word
sprmPicBrcRight 99 pic.brcRight BRC word
sprmSFRTLGutter 112 sep.fRTLGutter 0, 1, 128 or 129 byte
(see below)
sprmSFBiDi 114 sep.fBiDi 0, 1, 128 or 129 byte
(see below)
sprmSDmBinFirst 115 sep.dmBinFirst word
sprmSDmBinOther 116 sep.dmBinOther word
sprmSBkc 117 sep.bkc bkc byte
sprmSFTitlePage 118 sep.fTitlePage 0 or 1 byte
sprmSCcolumns 119 sep.ccolM1 # of cols - 1 word
sprmSDxaColumns 120 sep.dxaColumns dxa word
sprmSFAutoPgn 121 sep.fAutoPgn obsolete byte
sprmSNfcPgn 122 sep.nfcPgn nfc byte
sprmSDyaPgn 123 sep.yaPage ya word
sprmSDxaPgn 124 sep.xaPage xa word
sprmSFPgnRestart 125 sep.fPgnRestart 0 or 1 byte
sprmSFEndnote 126 sep.fEndnote 0 or 1 byte
sprmSLnc 127 sep.lnc lnc byte
sprmSGprfIhdt 128 sep.grpfIhdt grpfihdt (see byte
Headers and
Footers topic)
sprmSNLnnMod 129 sep.nLnnMod non-neg int. word
sprmSDxaLnn 130 sep.dxaLnn dxa word
sprmSVjc 134 sep.vjc vjc byte
sprmSLnnMin 135 sep.lnnMin lnn word
sprmSPgnStart 136 sep.pgnStart pgn word
sprmSBOrientation 137 sep.morPage mor (CHAR) byte
sprmSFFacingCol 138 sep.fFacingCol 0, 1, 128 or 129 byte
(see below)
sprmSXaPage 139 sep.xaPage xa word
sprmSYaPage 140 sep.yaPage ya word
sprmSDxaLeft 141 sep.dxaLeft dxa word
sprmSDxaRight 142 sep.dxaRight dxa word
sprmSDyaTop 143 sep.dyaTop dya word
sprmSDyaBottom 144 sep.dyaBottom dya word
sprmSDzaGutter 145 sep.dzaGutter dza word
sprmTJc 146 tap.jc jc word (low
order byte is
significant)
sprmTDxaLeft 147 tap.rgdxaCenter (see below) dxa word
sprmTDxaGapHalf 148 tap.dxaGapHalf, dxa word
tap.rgdxaCenter (see below)
sprmTFBiDi 149 tap.fBiDi 0, 1, 128 or 129 byte
(see below)
sprmTDefTable10 152 tap.rgdxaCenter, tap.rgtc complex (see variable length
below)
sprmTDyaRowHeight 153 tap.dyaRowHeight dya word
sprmTDefTable 154 complex (see 0
below)
sprmTDefTableShd 155 tap.rgshd complex (see 0
below)
sprmTSetBrc 157 tap.rgtc[].rgbrc complex (see 5 bytes
below)
sprmTInsert 158 tap.rgdxaCenter,tap.rgtc complex (see 4 bytes
below)
sprmTDelete 159 tap.rgdxaCenter, tap.rgtc complex (see word
below)
sprmTDxaCol 160 tap.rgdxaCenter complex (see 4 bytes
below)
sprmTMerge 161 tap.fFirstMerged, tap.fMerged complex (see word
below)
sprmTSplit 162 tap.fFirstMerged, tap.fMerged complex (see word
below)
sprmTSetBrc10 163 tap.rgtc[].rgbrc complex (see 5 bytes
below)
sprmTSetShd 164 tap.rgshd complex (see 4 bytes
below)
sprmMax 165
The paragraph sprms used to encode paragraph properties in a PAPX are: sprmPJc, sprmPFSideBySide, sprmPFKeep,
sprmPFKeepFollow, sprmPFPageBreakBefore, sprmPBrcp, sprmPPc, sprmPBrcl, sprmPFNoLineNumb,
sprmPDxaRight, sprmPDxaLeft., sprmPDxaLeft1, sprmPDyaLine, sprmPDyaBefore, sprmPDyaAfter, sprmPFInTable,
sprmPFTtp, sprmPDxaAbs, sprmPDyaAbs, sprmPDxaWidth, sprmPDxaWidth, sprmPBrcTop, sprmPBrcLeft,
The table sprms used to encode table properties in a PAPX stored in a PAPX FKP are: sprmTJc, sprmTDxaGapHalf,
sprmTDyaRowHeight, sprmTDefTableShd, and sprmTDefTable.
The section sprms used to encode section properties in a SEPX are:
sprmSBkc, sprmSFTitlePage, sprmSCcolumns, sprmSNfcPgn, sprmSPgnStart, sprmSFAutoPgn, sprmSDyaPgn,
sprmSDxaPgn, sprmSFPgnRestart, sprmSFEndnote, sprmSLnc, sprmSGrpfIhdt, sprmSNLnnMod, sprmSDxaLnn,
sprmSDyaHdrTop, sprmSDyaHdrBottom.
sprmPStcPermute (opcode 3) is a complex sprm which is applied to a piece when the style codes of paragraphs within a
piece must be mapped to other style codes. It has the following format:
Field Size Comment
sprm byte opcode( ==3)
cch byte count of bytes (not including sprm and cch)
mpstcFromstcTo byte permutation mapping from original stc values to new stc
values
To interpret sprmPStcPermute, first check if pap.stc is greater than 0 and less or equal to the cch stored in the sprm. If
not, the sprm has no effect. If it is, pap.stc is set to mpstcFromstcTo[pap.stc - 1]. sprmPStcPermute is only stored in
grpprls linked to a piece table.
sprmPIncLvl (opcode 4) is applied to pieces in the piece table that contain paragraphs with style codes greater than or
equal to 247 and less than or equal to 255. These style codes identify heading levels in a Word outline structure. The
sprm causes a set of paragraphs to be changed to a new heading level. The sprm is two bytes long and consists of the
sprm code and a one byte two’s complement value.
If pap.stc is < 247, sprmPIncLvl has no effect. Otherwise, if the value stored in the byte has its highest order bit off, the
value is a positive difference which should be subtracted from pap.stc and then pap.stc should be set to max(pap.stc,
247). If the byte value has its highest order bit on, the value is a negative difference which should be sign extended to a
word and then subtracted from pap.stc. Then pap.stc should be set to min(255, pap.stc). sprmPIncLvl is only stored in
grpprls linked to a piece table.
The sprmPChgTabsPapx (opcode 15) is a complex sprm that describes changes in tab settings from the underlying style.
It is only stored as part of PAPXs stored in FKPs and in the STSH. It has the following format:
Field Size Comment
sprm byte opcode
cch byte count of bytes (not including sprm and cch)
itbdDelMax byte number of tabs to delete
rgdxaDel int[itbdDelMax] array of tab positions for which tabs should be deleted
itbdAddMax byte number of tabs to add
rgdxaAdd int[itbdAddMax] array of tab positions for which tabs should be added
rgtbdAdd byte[itbdAddMax] array of tab descriptors corresponding to rgdxaAdd
When sprmPChgTabsPapx is interpreted, the rgdxaDel of the sprm is applied first to the pap that is being transformed.
This is done by deleting from the pap the rgdxaTab entry and rgtbd entry of any tab whose rgdxaTab value is equal to
one of the rgdxaDel values in the sprm. It is guaranteed that the entries in pap.rgdxaTab and the sprms rgdxaDel and
rgdxaAdd are recorded in ascending dxa order.
Then the rgdxaAdd and rgtbdAdd entries are merged into the pap’s rgdxaTab and rgtbd arrays so that the resulting pap
rgdxaTab is sorted in ascending order with no duplicates.
sprmPNest (opcode 18) causes its operand, a two-byte dxa value to be added to pap.dxaLeft. If the result of the addition
The sprmPChgTabs (opcode 23) is a complex sprm which describes changes tab settings for any paragraph within a
piece. It is only stored as part of a grpprl linked to a piece table. It has the following format:
Field Size Comment
sprm byte opcode
cch byte count of bytes (not including sprm and cch)
itbdDelMax byte number of tabs to delete
rgdxaDel int[itbdDelMax] array of tab positions for which tabs should be deleted
rgdxaClose int[itbdDelMax] array of tolerances corresponding to rgdxaDel where
each tolerance defines an interval around
corresponding rgdxaDel entry within which all tabs
should be removed
itbdAddMax byte number of tabs to add
rgdxaAdd int[itbdAddMax] array of tab positions for which tabs should be added
rgtbdAdd byte[itbdAddMax] array of tab descriptors corresponding to rgdxaAdd
itbdDelMax and itbdAddMax are defined to be equal to 50. This means that the largest possible instance of
sprmPChgTabs is 354. When the length of the sprm is greater than or equal to 255, the cch field will be set equal to 255.
When cch == 255, the actual length of the sprm can be calculated as follows: length = 2 + itbdDelMax * 4 +
itbdAddMax * 3.
When sprmPChgTabs is interpreted, the rgdxaDel of the sprm is applied first to the pap that is being transformed. This
is done by deleting from the pap the rgdxaTab entry and rgtbd entry of any tab whose rgdxaTab value is within the
interval [rgdxaDel[i] - rgdxaClose[i], rgdxaDel[i] + rgdxaClose[i]] It is guaranteed that the entries in pap.rgdxaTab and
the sprms rgdxaDel and rgdxaAdd are recorded in ascending dxa order.
Then the rgdxaAdd and rgtbdAdd entries are merged into the pap’s rgdxaTab and rgtbd arrays so that the resulting pap
rgdxaTab is sorted in ascending order with no duplicates.
The sprmPPc (opcode 29) is a complex sprm which describes changes in the pap.pcHorz and pap.pcVert. It is able to
change both fields’ contents in parallel. It has the following format:
Dec Hex field type size bitfield comments
0 0 sprm byte opcode
1 1 int :4 F0 reserved
pcVert int :2 0C if pcVert ==3, pap.pcVert should not be
changed. Otherwise, contains new value
of pap.pcVert.
pcHorz int :2 03 if pcHorz==3, pap.pcHorz should not be
changed. Otherwise, contains new value
of pap.pcHorz.
Length of sprmPPc is two bytes.
sprmPPc is interpreted by moving pcVert to pap.pcVert if pcVert != 3 and by moving pcHorz to pap.pcHorz if pcHorz !=
3. sprmPPc is stored in PAPX FKPs and also in grpprls linked to piece table entries.
sprmCDefault (opcode 57) clears the fBold, fItalic, fOutline, fStrike, fShadow, fSmallCaps, fCaps, fVanish, kul and ico
fields of the chp to 0. It was first defined for Word 3.01 and had to be backward compatible with Word 3.00 so it is a
variable length sprm whose count of bytes is 0. It consists of the sprmCDefault opcode followed by a byte of 0.
sprmCDefault is stored only in grpprls linked to piece table entries.
sprmCPlain (opcode 58) is used to make the character properties of runs of text equal to the style character properties of
the paragraph that contains the text. When Word interprets this sprm, the style sheet CHP is copied over the original
CHP preserving the fSpec setting from the original CHP. sprmCPlain is stored only in grpprls linked to piece table
sprms 60 through 67 (sprmCFBold through sprmCFVanish) set single bit properties in the CHP. When the parameter of
the sprm is set to 0 or 1, then the CHP property is set to the parameter value.
When the parameter of the sprm is 128, then the CHP property is set to the value that is stored for the property in the
style sheet. CHP When the parameter of the sprm is 129, the CHP property is set to the negation of the value that is
stored for the property in the style sheet CHP. sprmCFBold through sprmCFVanish are stored only in grpprls linked to
piece table entries.
sprmCSizePos (opcode 70) is a four byte sprm consisting of the sprm opcode and a three byte parameter. The sprm has
the following format:
Dec Hex field type size bitfield comments
0 0 sprm byte opcode
1 1 hpsSize int :8 FF when != 0, contains new size of chp.hps
2 2 cInc int :7 FE contains the number of font levels to
increase or decrease size of chp.hps as a
twos complement value.
fAdjust int :1 01 when == 1, means that chp.hps should be
adjusted up/down by one font level for
super/subscripting change
3 3 hpsPos int :8 FF when != 128, contains super/subscript
position as a twos complement number
When Word interprets this sprm, if hpsSize != 0 then chp.hps is set to hpsSize. If cInc is != 0, the cInc is interpreted as a
7 bit twos complement number and the procedure described below for interpreting sprmCHpsInc is followed to increase
or decrease the chp.hps by the specified number of levels. If hpsPos is != 128, then chp.hpsPos is set equal to hpsPos. If
fAdjust is on , hpsPos != 128 and hpsPos != 0 and the previous value of chp.hpsPos == 0, then chp.hps is reduced by one
level following the method described for sprmCHpsInc. If fAdjust is on, hpsPos == 0 and the previous value of
chp.hpsPos != 0, then the chp.hps value is increased by one level using the method described below for sprmCHpsInc.
sprmCHpsInc(opcode 75) is a two-byte sprm consisting of the sprm opcode and a one-byte parameter. Word keeps an
ordered array of the font sizes that are defined for the fonts recorded in the system file with each font size transformed
into an hps. The parameter is a one-byte twos complement number. Word uses this number to calculate an index in the
font size array to determine the new hps for a run. When Word interprets this sprm and the parameter is positive, it
searches the array of font sizes to find the index of the smallest entry in the font size table that is greater than the current
chp.hps. It then adds the parameter minus 1 to the index and maxes this with the index of the last array entry. It uses the
result as an index into the font size array and assigns that entry of the array to chp.hps.
When the parameter is negative, Word searches the array of font sizes to find the index of the entry that is less than or
equal to the current chp.hps. It then adds the negative parameter to the index and does a min of the result with 0. The
result of the min function is used as an index into the font size array and that entry of the array is assigned to chp.hps.
sprmCHpsInc is stored only in grpprls linked to piece table entries.
sprmCHpsPosAdj (opcode 77) causes the hps of a run to be reduced the first time text is superscripted or subscripted and
causes the hps of a run to be increased when superscripting/subscripting is removed from a run. The one byte parameter
of this sprm is the new hpsPos value that is to be stored in chp.hpsPos. If the new hpsPos is not equal 0 (meaning that the
text is to be super/subscripted), Word first examines the current value of chp.hpsPos to see if it is equal to 0. If so, Word
uses the algorithm described for sprmCHpsInc to decrease chp.hps by one level. If the new hpsPos == 0 (meaning the
text is not super/subscripted), Word examines the current chp.hpsPos to see if it is not equal to 0. If it is not (which
means text is being restored to normal position), Word uses the sprmCHpsInc algorithm to increase chp.hps by one level.
After chp.hps is adjusted, the parameter value is stored in chp.hpsPos. sprmCHpsPosAdj is stored only in grpprls linked
to piece table entries.
The parameter of sprmCMajority (opcode 78) is the first 8 bytes of a CHP which encodes a criterion under which certain
value as the field stored in the sprm, then that field is reset to the value stored in the style’s CHP. If the two copies differ,
then the original CHP value is left unchanged. sprmCMajority is stored only in grpprls linked to piece table entries.
sprmPicScale (opcode 95) is used to scale the x and y dimensions of a Word picture and to set the cropping for each side
of the picture. The sprm begins with the one byte opcode, followed by the length of the parameter (always 12) stored in a
byte. The 12-byte long operand consists of an array of 6 two-byte integer fields. The 0th integer contains the new setting
for pic.mx. The 1st integer contains the new setting for pic.my. The 2nd integer contains the new setting for
pic.dxaCropLeft. The 3rd integer contains the new setting for pic.dyaCropTop. The 4th integer contains the new setting
for pic.dxaCropRight. The 5th integer contains the new setting of pic.dxaCropBottom. sprmPicScale is stored only in
grpprls linked to piece table entries.
sprmTDxaLeft (opcode 147) is called to adjust the x position within a column which marks the left boundary of text
within the first cell of a table row. This sprm causes a whole table row to be shifted left or right within its column
leaving the horizontal width and vertical height of cells in the row unchanged. Byte 0 of the sprm contains the opcode,
and the new dxa position, call it dxaNew, is stored as an integer in bytes 1 and 2. Word interprets this sprm by adding
dxaNew - (rgdxaCenter[0] + tap.dxaGapHalf) to every entry of tap.rgdxaCenter whose index is less than tap.itcMac.
sprmTDxaLeft is stored only in grpprls linked to piece table entries.
sprmTDxaGapHalf (opcode 148) adjusts the white space that is maintained between columns by changing
tap.dxaGapHalf. Because we want the left boundary of text within the leftmost cell to be at the same location after the
sprm is applied, Word also adjusts tap.rgdxCenter[0] by the amount that tap.dxaGapHalf changes. Byte 0 of the sprm
contains the opcode, and the new dxaGapHalf, call it dxaGapHalfNew, is stored in bytes 1 and 2. When the sprm is
interpreted, the change between the old and new dxaGapHalf values, tap.dxaGapHalf - dxaGapHalfNew, is added to
tap.rgdxaCenter[0] and then dxaGapHalfNew is moved to tap.dxaGapHalf. sprmTDxaGapHalf is stored in PAPXs and
also in grpprls linked to piece table entries.
sprmTDefTable10 (opcode 152) is an obsolete version of sprmTDefTable (opcode 154) that was used in Word for
Windows 1.x. Its contents are identical to those in sprmTDefTable, except that the TC structures contain the obsolete
structures BRC10s.
sprmTDefTable (opcode 154) defines the boundaries of table cells (tap.rgdxaCenter) and the properties of each cell in a
table (tap.rgtc). The 0th byte of the sprm contains its opcode. Bytes 1 and 2 store a two-byte length of the following
parameter. Byte 3 contains the number of cells that are to be defined by the sprm, call it itcMac. When the sprm is
interpreted, itcMac is moved to tap.itcMac. itcMac cannot be larger than 32. In bytes 4 through 4+2*(itcMac + 1) -1 , is
stored an array of integer dxa values sorted in ascending order which will be moved to tap.rgdxaCenter. In bytes 4+
2*(itcMac + 1) through byte 4+2*(itcMac + 1) + 10*itcMac - 1 is stored an array of TC entries corresponding to the
stored tap.rgdxaCenter. This array is moved to tap.rgtc. sprmTDefTable is only stored in PAPXs.
sprmTDefTableShd (opcode 155) is similar to sprmTDefTable, and compliments it by defining the shading of each cell
in a table (tap.rgshd). The 0th byte of the sprm contains its opcode. Bytes 1 and 2 store a two-byte length of the
following parameter. Byte 3 contains the number of cells that are to be defined by the sprm, call it itcMac. itcMac
cannot be larger than 32. In bytes 4 through 4+2*(itcMac + 1) -1 , is stored an array of SHDs. This array is moved to
tap.rgshd. sprmTDefTable is only stored in PAPXs.
sprmTInsert (opcode 158) inserts new cell definitions in an existing table’s cell structure. The 0th byte of the sprm
contains the opcode Byte 1 is the index within tap.rgdxaCenter and tap.rgtc at which the new dxaCenter and tc values
will be inserted. Call this index itcInsert. Byte 2 contains a count of the cell definitions to be added to the tap, call it ctc.
Bytes 3 and 4 contain the width of the cells that will be added, call it dxaCol. If there are already cells defined at the
index where cells are to be inserted, tap.rgdxaCenter entries at or above this index must be moved to the entry ctc higher
and must be adjusted by adding ctc * dxaCol to the value stored. The contents of tap.rgtc at or above the index must be
moved 10 * ctc bytes higher in tap.rgtc. If itcInsert is greater than the original tap.itcMac, itcInsert - tap.ctc columns
beginning with index tap.itcMac must be added of width dxaCol (loop from itcMac to itcMac +itcInsert-tap.ctc adding
were added to the tap is added to tap.itcMac. sprmTInsert is stored only in grpprls linked to piece table entries.
sprmTDelete (opcode 159) deletes cell definitions from an existing table’s cell structure. The 0th byte of the sprm
contains the opcode. Byte 1 contains the index of the first cell to delete, call it itcFirst. Byte 2 contains the index of the
cell that follows the last cell to be deleted, call it itcLim. sprmTDelete causes any rgdxaCenter and rgtc entries whose
index is greater than or equal to itcLim to be moved to the entry that is itcLim - itcFirst lower, and causes tap.itcMac to
be decreased by the number of cells deleted. sprmTDelete is stored only in grpprls linked to piece table entries.
sprmTDxaCol (opcode 160) changes the width of cells whose index is within a certain range to be a certain value. The
0th byte of the sprm contains the opcode. Byte 1 contains the index of the first cell whose width is to be changed, call it
itcFirst. Byte 2 contains the index of the cell that follows the last cell whose width is to be changed, call it itcLim. Bytes
3 and 4 contain the new width of the cell, call it dxaCol. This sprm causes the itcLim - itcFirst entries of tap.rgdxaCenter
to be adjusted so that tap.rgdxaCenter[i+1] = tap.rgdxaCenter[i] + dxaCol. Any tap.rgdxaCenter entries that exist
beyond itcLim are adjusted to take into account the amount added to or removed from the previous columns.
sprmTDxaCol is stored only in grpprls linked to piece table entries.
sprmTMerge (opcode 161) merges the display areas of cells within a specified range. The 0th byte of the sprm contains
the opcode. Byte 1 contains the index of the first cell that is to be merged, call it itcFirst. Byte 2 contains the index of the
cell that follows the last cell to be merged, call it itcLim. This sprm causes tap.rgtc[itcFirst].fFirstMerged to be set to 1.
Cells in the range whose index is greater than itcFirst and less than itcLim have tap.rgtc[].fMerged set to 1. sprmTMerge
is stored only in grpprls linked to piece table entries.
sprmTSplit (opcode 162) splits the display areas of merged cells into their originally assigned display areas. The 0th byte
of the sprm contains the opcode. Byte 1 contains the index of the first cell that is to be split, call it itcFirst. Byte 2
contains the index of the cell that follows the last cell to be split, call it itcLim. This sprm clears tap.rgtc[].fFirstMerged
and tap.rgtc[].fMerged for all rgtc entries >= itcFirst and < itcLim. sprmTSplit is stored only in grpprls linked to piece
table entries.
sprmTSetBrc (opcode 157) allows the border definitions(BRCs) within TCs to be set to new values. It has the following
format:
Dec Hex field type size bitfield comments
0 0 sprm byte opcode 157
1 1 itcFirst byte the index of the first cell that is to have its
borders changed.
2 2 itcLim byte index of the cell that follows the last cell to
have its borders changed
3 3 int :4 F0 reserved
fChangeRight int :1 08 =1 when tap.rgtc[].brcRight is to be
changed
fChangeBottom int :1 04 =1 when tap.rgtc[].brcBottom is to be
changed
fChangeLeft int :1 02 =1 when tap.rgtc[].brcLeft is to be
changed
fChangeTop int :1 01 =1 when tap.rgtc[].brcTop is to be
changed
4 4 brc BRC new BRC value to be stored in TCs.
This sprm changes the brc fields selected by the fChange* flags in the sprm to the brc value stored in the sprm, for every
tap.rgtc entry whose index is greater than or equal to itcFirst and less than itcLim.sprmTSetBrc is stored only in grpprls
linked to piece table entries.
4 contain the SHD structure, call it shd. This sprm causes the itcLim - itcFirst entries of tap.rgshd to be set to shd.
sprmTDxaCol is stored only in grpprls linked to piece table entries.
Complex File Format
The complex file format is used when a file is fast-saved. A complex file has fib.fComplex set to 1. In a complex file,
fcClx is the fc where the complex part of the file begins, and cbClx is the size (in bytes) of the complex part. The
complex part of the file contains a group of grpprls that encode formatting changes made by the user and a piece table
(plcfpcd). The piece table is needed because the text of the document is not stored contiguously in the file after a fast
save.
The complex part of a file (CLX) is composed of a number of variable-sized blocks of data. Recorded first are any
grpprls that may be referenced by the plcfpcd (if the plcfpcd has no grpprl references, no grpprls will be recorded)
followed by the plcfpcd. Each block in the complex part is prefaced by a clxt (clx type), which is a 1-byte code, either 1
(meaning the block contains a grpprl) or 2 (meaning this is the plcfpcd). In both cases, the clxt is followed by a 2-byte
cb which is the count of bytes of the grpprl or the piece table. So the formats of the two types of blocks are:
clxt = 1 clxtGrpprl
cb count of bytes in grpprl
grpprl see "Definitions" for description of grpprl; a grpprl can contain sprms modifying
character, paragraph, table, section or picture properties
or
clxt = 2 clxtPlcfpcd
cb count of bytes in piece table
plcfpcd piece table
The entire CLX would look like this, depending on the number of grpprls:
clxtGrpprl
cb
grpprl (0th grpprl)
clxtGrpprl
cb
grpprl (1st grpprl)
...
clxtPlcfpcd
cb
plcfpcd
When the prm in PCDs stored in the plcfpcd, contains an igrpprl (index to a grpprl), the index stored is the order in
which that grpprl was stored in the CLX.
Algorithm to determine the bounds of a paragraph containing a certain character in a
complex file
When a document is recorded in non-complex format, the bounds of the paragraph that contains a particular character
can be found by calculating the FC coordinate of the character, searching the bin table to find an FKP page that describes
that FC, fetching that FKP, and then searching the FKP to find the interval in the rgfc that encloses the character. The
When a document is recorded in complex format, a piece that was originally part of one paragraph can be copied or
moved within a different paragraph. To find the beginning of the paragraph containing a character in a complex
document, it’s first necessary to search for the piece containing the character in the piece table. Then calculate the FC in
the file that stores the character from the piece table information. Using the FC, search the FCs FKP for the largest FC
less than the character’s FC, call it fcTest. If the character at fcTest-1 is contained in the current piece, then the character
corresponding to that FC in the piece is the first character of the paragraph. If that FC is before or marks the beginning of
the piece, scan a piece at a time towards the beginning of the piece table until a piece is found that contains a paragraph
mark. This can be done by using the end of the piece FC, finding the largest FC in its FKP that is less than or equal to the
end of piece FC, and checking to see if the character in front of the FKP FC (which must mark a paragraph end) is within
the piece. When such an FKP FC is found, the FC marks the first byte of paragraph text.
To find the end of a paragraph for a character in a complex format file, again it is necessary to know the piece that
contains the character and the FC assigned to the character. Using the FC of the character, first search the FKP that
describes the character to find the smallest FC in the rgfc that is larger than the character FC. If the FC found in the FKP
is less than or equal to the limit FC of the piece, the end of the paragraph that contains the character is at the FKP FC
minus 1. If the FKP FC that was found was greater than the FC of the end of the piece, scan piece by piece toward the
end of the document until a piece is found that contains a paragraph end mark. It’s possible to check if a piece contains a
paragraph mark by using the FC of the beginning of the piece to search in the FKPs for the smallest FC in the FKP rgfc
that is greater than the FC of the beginning of the piece. If the FC found is less than or equal to the limit FC of the piece,
then the character that ends the paragraph is the character immediately before the FKP FC.
A special procedure must be followed to locate the last paragraph of the main document text when footnote or
header/footer text is saved in a Word file (i.e. when fib.ccpFtn != 0 or fib.ccpHdr != 0).
In this case the CP of that paragraph mark is fib.ccpText + fib.ccpFtn + fib.ccpHdr + fib.ccpMcr + fib.ccpAtn and the
limit CP of the entire plcfpcd is fib.ccpText + fib.ccpFtn + fib.ccpHdr + fib.ccpMcr + fib.ccpAtn + 1.
Algorithm to determine paragraph properties for a paragraph in a complex file
Having found the index i of the FC in an FKP that marks the character stored in the file immediately after the
paragraph’s paragraph mark, it is necessary to use fkp.rgb[i - 1] to find the PAPX for the paragraph. Using papx.stc to
index into the properties stored for the style sheet , the paragraph properties of the style are copied to a local PAP. Then
the grpprl stored in the PAPX is applied to the local PAP, and papx.stc along with papx.phe are moved into the local
PAP. The process thus far has created a PAP that describes what the paragraph properties of the paragraph were at the
last full save. Now it’s necessary to apply any paragraph sprms that were linked to the piece that contains the
paragraph’s paragraph mark. If pcd.prm.fComplex is 0, pcd.prm contains 1 sprm which should only be applied to the
local PAP if it is a paragraph sprm. If pcd.prm.fComplex is 1, pcd.prm.igrpprl is the index of a grpprl in the CLX. If that
grpprl contains any paragraph sprms, they should be applied to the local PAP. After applying all of the sprms for the
piece, the local PAP contains the correct paragraph property values.
Algorithm to determine table properties for a table row in a complex file
To determine the table properties for a table row in a complex file, scan paragraph-by-paragraph toward the end of the
table row, until a paragraph is found that has pap.fTtp set to 1. This paragraph consists of a single row end character.
This row end character is linked to the table properties of the row. To create the TAP for the table row, clear a local TAP
to zeros. Then the PAPX for the row end character must be fetched from an FKP, and the table sprms that are stored in
this PAPX must be applied to the local TAP. The process thus far has created a TAP that describes what the table
properties of the table row were at the last full save. Now apply any table sprms that were linked to the piece that
contains the table row’s row end character. If pcd.prm.fComplex is 0, pcd.prm contains 1 sprm which should be applied
to the local TAP if it is a table sprm. If pcd.prm.fComplex is 1, pcd.prm.igrpprl is the index of a grpprl in the CLX. If
that grpprl contains any table sprms, apply them to the local TAP. After all of the sprms for the piece are applied, the
local TAP contains the correct table property values for the table row.
Algorithm to determine the character properties of a character in a complex file
character properties recorded in the style sheet for that style are copied into a local CHP. Then, the piece containing the
character is located in the piece table (plcfpcd) and the fc of the character is calculated. Using the character’s FC, the
page number of the CHPX FKP that describes the character is found by searching the bin table (hplcfbteChpx). The
CHPX FKP stored in that page is fetched and then the rgfc in the FKP is searched to locate the bounds of the run of
exception text that encompasses the character. The CHPX for that run is then located within the FKP, and the CHPX is
applied to the contents of the local CHP. The process thus far has created a CHP that describes what the character
properties of the character were at the last full save. Now apply any character sprms that were linked to the piece that
contains the character. If pcd.prm.fComplex is 0, pcd.prm contains 1 sprm which should be applied to the local CHP if it
is a character sprm. If pcd.prm.fComplex is 1, pcd.prm.igrpprl is the index of a grpprl in the CLX. If that grpprl contains
any character sprms, apply them to the local CHP. After applying all of the sprms for the piece, the local CHP contains
the correct properties for the character.
Characters that are within the same piece, same paragraph, and same run of exception text are guaranteed to have the
same properties. This fact can be used to construct a scanner that can return the limit CPs and properties of a sequence of
characters that all have the same properties.
Algorithm to determine the section properties of a section in a complex file
To determine which section a character belongs to and what its section properties are, it is necessary to use the CP of the
character to search the plcfsed for the index i of the largest CP that is less than or equal to the character’s CP.
plcfsed.rgcp[i] is the CP of the first character of the section and plcfsed.rgcp[i+1] is the CP of the character following the
section mark that terminates the section (call it cpLim). Then retrieve plcfsed.rgsed[i]. The FC in this SED gives the
location where the SEPX for the section is stored. Then create a local SEP with default section properties. If the sed.fc !=
0xFFFFFFFF, then the sprms within the SEPX that is stored at offset sed.fc must be applied to the local SEP. The
process thus far has created a SEP that describes what the section properties of the section at the last full save. Now
apply any section sprms that were linked to the piece that contains the section’s section mark. If pcd.prm.fComplex is 0,
pcd.prm contains 1 sprm which should be applied to the local SEP if it is a section sprm. If pcd.prm.fComplex is 1,
pcd.prm.igrpprl is the index of a grpprl in the CLX. If that grpprl contains any section sprms, they should be applied to
the local SEP. After applying all of the section sprms for the piece , the local SEP contains the correct section properties.
Algorithm to determine the PIC of a picture in a complex file.
The picture sprms contained in the prm's grpprl apply to any picture characters within the piece that have their chp.fSpec
character == fTrue. The picture properties for a picture (the PIC described in the Structure Definitions) are derived by
fetching the PIC stored with the picture and applying to that PIC any picture sprms linked to the piece containing the
picture special character.
Footnotes
In Word for Windows the text of a footnote is anchored to a particular position within the document’s main text , the
location of its footnote reference. There is a structure referenced by the fib, the plcffndRef, which records the locations
of the footnote references within the main text address space and another structure referenced by the fib, the plcffndTxt,
which records the beginning locations of corresponding footnote text within the footnote text address space . The
footnote text characters in a full saved file begin at offset fib.fcMin + fib.ccpText and extends till fib.fcMin +
fib.ccpText + fib.ccpFtn. In a complex fast-saved document , the footnote text begins at CP fib.ccpText and extends till
fib.ccpText + fib.ccpFtn. To find the location of the ith footnote reference in the main text address space, look up the ith
entry in the plcffndRef and find the location of the text corresponding to the reference within the footnote text address
space by looking up the ith entry in the plcffndTxt.
When there are n footnotes, the plcffndTxt structure consists of n+2 CP entries. The CP entries mark the beginning
The last character of footnote text for a footnote (i.e. the character at limit CP - 1) is always a paragraph end(ASCII 13).
If there are n footnotes, the n + 2nd CP entry value is always 1 greater than the n+1st CP entry value. A paragraph end
(ASCII 13) is always stored at the file position marked by the n+1st CP value.
When there are n footnotes, the plcffndRef structure consists of n+1 CP entries followed by n integer flags, named
fAuto. The ith CP in the plcffndRef corresponds to the ith fAuto flag. The CP entries give the locations of footnote
references within the main text address space. The n + 1st CP entry contains the value fib.ccpText + fib.ccpFtn +
fib.ccpHdr + 1. The fAuto flag contains 1 whenever the footnote reference name is auto-generated by Word.
When a footnote reference name is automatically generated by Word, Word generates the name by adding 1 to the index
number of the reference in the plcffndRef and translating that number to ASCII text. When the footnote reference is auto
generated, the character at the main text CP position for the footnote reference should be a footnote reference character
(ASCII 5) which has a chp recorded with chp.fSpec = 1.
The number of footnotes stored in a Word binary file can be found by dividing fib.cbPlcffndTxt by 4 and subtracting 1.
Headers and Footers
The header and footer text characters in a full saved file begin at offset fib.fcMin + fib.ccpText + fib.ccpFtn and extend
till fib.fcMin + fib.ccpText + fib.ccpFtn + fib.ccpHdr. In a complex fast-saved document , the footnote text begins at CP
fib.ccpText + fib.ccpFtn and extends till fib.ccpText + fib.ccpFtn + fib.ccpHdr. The plcfhdd, a table whose location and
length within the file is stored in fib.fcPlcfhdd and fib.cbPlcfhdd, describes where the text of each header/footer begins.
If there are n headers/footers stored in the Word file, the plcfhdd consists of n + 2 CP entries. The beginning CP of the
ith header/footer is the ith CP in the plcfhdd. The limit CP (the CP of character 1 position past the end of a
header/footer) of the ith header/footer is the i + 1 st CP in the plcfhdd. Note that at the limit CP - 1, Word always places
a chEop as a place holder which is never displayed as part of the header/footer. This allows Word to change an existing
header/footer to be empty.
If there are n header/footers, the n + 2nd CP entry value is always 1 greater than the n+1st CP entry value. A paragraph
end (ASCII 13) is always stored at the file position marked by the n+1st CP value.
The transformation in a full saved file from a header/footer CP to an offset from the beginning of a file (fc) is fc =
fib.fcMin + ccpText + ccpFtn + cp.
In Word for Windows, headers/footers can be defined for a document that:
1) will act as a separator between main text and footnote text
2) will print below footnote text on a page when footnote text must be continued on a succeeding page
(continuation separator)
3) will print above footnote text on a page when the text must be continued from a previous page (continuation
notice)
Also for each section defined for the document, distinct headers can be defined for printing on odd-numbered/right
facing pages, even-numbered /left facing pages and the first page of a section. Similarly for each document section,
distinct footers can be defined for printing on odd-numbered/right facing pages, even-numbered/left facing pages and the
first page of a section.
Within the document and the section properties of a document (the DOP and SEP) is a field, the grpfIhdt, which
enumerates which of the header/footer types are defined for the document or for a particular section. The grpfIhdt in both
corresponding to the bit is defined for the document or for a particular section.
Definition of the bits of dop.grpfIhdt:
Bit position
7 footnote separator defined when == 1 (fTrue).
6 footnote continuation separator defined when == 1 (fTrue).
5 footnote continuation notice defined when == 1 (fTrue).
Definition of the bits of sep.grpfIhdt:
Bit position
7 header for even pages defined when == 1 (fTrue).
6 header for odd pages defined when == 1 (fTrue).
5 footer for even pages defined when == 1 (fTrue).
4 footer for odd pages defined when == 1 (fTrue).
3 header for first page of section defined when == 1 (fTrue).
2 footer for first page of section defined when == 1 (fTrue).
Given that a particular footnote separator exists, one can locate the text for that separator using the following algorithm:
Initially set ihdd (index into plcfhdd) to 0.
Scan bits 7, 6, and 5 of the dop.grpfIhdt in order looking for bit == 1 while you have not yet reached the bit
corresponding to the separator whose text is to be located. For each such bit ==1 add 1 to ihdd.
The value of ihdd that results is the index into plcfhdd that can be used to access the text of the separator.
Given that a particular header/footer exists for a particular section, one can locate the text for that header/footer using the
following algorithm:
initially set ihdd (index into plcfhdd) to 0.
scan bits 7, 6, and 5 of the dop.grpfIhdt looking for bit == 1 and add 1 to ihdd for each such bit == 1.
Examine the sep.grpfIhdt of each section preceding the section of the header/footer to be located in ascending section
number order, scanning bits 7, 6, 5, 4, 3, and 2 of the sep.grpfIhdt in order, adding 1 to ihdd for each bit == 1.
For the section of the header/footer to be located, scan bits 7, 6, 5, 4, 3, and 2 of the sep.grpfIhdt in order looking for bit
== 1 while you have not yet reached the bit corresponding to the header/footer to be located. For each such bit ==1 add
1 to ihdd.
The value of ihdd that results is the index into plcfhdd that can be used to access the text of the header/footer.
Page Table
The plcfpgd, referenced by the fib, gives the location of page breaks within a Word document and may optionally be
saved in a Word binary file. If there are n page breaks calculated for a document, the plcfpgd would consist of n+1 CP
entries followed by n PGD entries.
Third-party creators of Word for Windows files should not attempt to create a plcfpgd. It can only be created properly
using Word for Windows' page layout routines. If a Word for Windows document is edited in any way, the plcfpgd
should be deleted by setting fib.cbPlcfpgd to 0.
If there are n pages breaks recorded for the document stored, the n+1st CP stored in the array of CPs for the plcfpgd will
have the value fib.ccpText + fib.ccpFtn + fib.ccpHdr + 1 if the document contains footnotes or header/footers and will
have the value fib.ccpText + fib.ccpFtn + fib.ccpHdr if the document contains no subdocuments.
Glossary Files
beginning positions within the text address space of the file of the text of glossary entries.
The sttbfglsy begins with an integer count of bytes of the size of the sttbfglsy (includes the size of the integer count of
bytes). If there are n glossary entries defined, there will follow n Pascal-type strings (string preceded by length byte)
concatenated one after the other which store glossary entry names. The glossary entry names must be sorted in case-
insensitive ascending order. (i.e. a and A are treated as equal). Also the names date and time must be included in the list
of names. The name of the ith glossary entry is the ith name defined in the sttbfglsy.
If there are n glossary entries, the plcfglsy, will consist of n+2 CP entries. The ith CP entry will contain the location of
the beginning of the text for the ith glossary entry. The i + 1st CP entry will contain the limit CP of the ith glossary
entry. The character at a CP position of limit CP - 1 is always a paragraph mark. The n + 2nd CP entry always contains
fib.ccpText + fib.ccpFtn + fib.ccpHdr + 1 if there are headers, footers or footnotes stored in the glossary and contains
fib.ccpText + fib.ccpFtn + fib.ccpHdr otherwise. The n+1st CP entry is always 1 less than the value of the n + 2nd
entry.
The text for the time and date entries will always be a single paragraph mark (ASCII 13).
sttbfAssoc (Table of Associated Strings)
The following are indices into a table of associated strings:
ibst index description
ibstAssocFileNext 0 unused
ibstAssocDot 1 filename of associated template
ibstAssocTitle 2 title of document
ibstAssocSubject 3 subject of document
ibstAssocKeyWords 4 keywords of document
ibstAssocComments 5 comments of document
ibstAssocAuthor 6 author of document
ibstAssocLastRevBy 7 name of person who last revised the document
ibstAssocDataDoc 8 filename of data document
ibstAssocHeaderDoc 9 filename of header document
ibstAssocCriteria1 10 packed string used by print merge record selection
ibstAssocCriteria2 11 packed string used by print merge record selection
ibstAssocCriteria3 12 packed string used by print merge record selection
ibstAssocCriteria4 13 packed string used by print merge record selection
ibstAssocCriteria5 14 packed string used by print merge record selection
ibstAssocCriteria6 15 packed string used by print merge record selection
ibstAssocCriteria7 16 packed string used by print merge record selection
ibstAssocMax 17 maximum number of strings in string table
The format of the ibstAssocCriteriaX strings are as follows:
int cbIbstAssoc:8; // BYTE 0 size of ibstAssocCriteriaX string
int fCompOr:1; // BYTE 1 set if condition is an or condition
int iCompOp:7; // BYTE 1 index of Comparison Operator
char stMergeField[]; // Name of Merge Field
char stCompInfo[]; // User Supplied Comparison Information
Both stMergeField and stCompInfo are variable length character arrays preceded by a length byte.
BRC: Border Code
The BRC is a substructure of the PAP, PIC and TC. See also the obsolete BRC10 structure.
Dec Hex field type size bitfield comments
0 0 dxpLineWidth int :3 0007 width of a single line of border in units of
0.75 points. Each line in the border is this
wide (e.g. a double border is three lines).
Must be nonzero when brcType is
nonzero. Max width currently used = 4,
max width allowed = 5.
brcType int :2 0018 border type code
0 none
1 single
2 thick
3 double
fShadow int :1 0020 when 1, border is drawn with shadow.
Must be 0 when BRC is a substructure of
the TC
ico int :5 07C0 color code (see chp.ico)
dxpSpace int :5 F800 width of space to maintain between border
and text within border. Must be 0 when
BRC is a substructure of the TC. Stored
in points for Windows.
sizeof(BRC) == 2.
BRC10: Border Code for Word for Windows 1.0
Dec Hex field type size bitfield comments
0 0 dxpLine2Width int :3 0007 width of second line of border in pixels
dxpSpaceBetween int :3 0038 distance to maintain between both lines of
border in pixels
dxpLine1Width int :3 01C0 width of first border line in pixels
dxpSpace int :5 3E00 width of space to maintain between border
and text within border. Must be 0 when
BRC is a substructure of the TC.
fShadow int :1 4000 when 1, border is drawn with shadow.
Must be 0 when BRC10 is a substructure
of the TC.
fSpare int :1 8000 reserved
The seven types of border lines that Word for Windows 1.0 supports are coded with different sets of values for
dxpLine1Width, dxpSpaceBetween, and dxpLine2 Width.
The border lines and their brc10 settings follow:
line type dxpLine1Width dxpSpaceBetween dxpLine2Width
no border 0 0 0
single line border 1 0 0
two single line border 1 1 1
fat solid border 4 0 0
hairline border 7(special value meaning 0 0
hairline)
When the no border settings are stored in the BRC, brc.fShadow and brc.dxpSpace should be set to 0.
CHP/CHPX: Character Properties
The CHP and the CHPX have exactly the same field structure. They differ in how the fields are interpreted. Listed
below is the format of the CHP/CHPX with the interpretations for each field listed in the comment column.
The CHP is never stored in Word files. It is the result of decompression operations applied to CHPXs
The CHPX is stored in CHPX FKPs and within the STSH
(Note: when a CHPX is stored in an FKP it is prefixed by a one-byte count of bytes that records the size of the non-zero
prefix of the CHPX. Since the count of bytes must begin on an even boundary within the FKP followed by the non-zero
prefix, it's guaranteed that the int and FC fields of the CHPX are aligned on an odd-byte boundary. Using normal integer
or long load instructions will cause address errors on a 68000. The best technique for reconstituting the CHPX is to move
the non-zero prefix to the beginning of a local instance of a CHPX that has been cleared to zeros.)
Dec Hex field type size bitfield comment
0 0 fBold int :1 0001 for the CHP, text is bold when 1 , and not
bold when 0.
for the CHPX, text boldness is opposite of
the boldness of the style's CHP when 1;
same as style when 0.
fItalic int :1 0002 CHP: italic when 1, not italic when 0
CHPX: opposite of style when 1, same as
style when 0.
fRMarkDel int :1 0004 CHP: displayed with revision mark
strikethrough when 1, no revision mark
strikethrough when 0.
CHPX: opposite of style when 1, same as
style when 0.
fOutline int :1 0008 CHP: outlined when 1, not outlined when
0
CHPX: opposite of style when 1, same as
style when 0.
fFldVanish int :1 0010 <needs work>
fSmallCaps int :1 0020 CHP: displayed with small caps when 1,
no small caps when 0
CHPX: opposite of style when 1, same as
style when 0.
fCaps int :1 0040 CHP: displayed with caps when 1, no caps
when 0
CHPX: opposite of style when 1, same as
style when 0.
fVanish int :1 0080 CHP: vanished when 1, not vanished
when 0
CHPX: opposite of style when 1, same as
style when 0.
1 1 fRMark int :1 0100
CHPX: opposite of style when 1, same as
style when 0.
fStrike int :1 0400 CHP: displayed with strikethrough when 1,
no strikethrough when 0
CHPX: opposite of style when 1, same as
style when 0.
fObj int :1 0800 CHP: embedded object when 1, not an
embedded object when 0
CHPX: opposite of style when 1, same as
style when 0.
1 1 fBoldBi int :1 1000 for the CHP, bidi text is bold when 1 , and
not bold when 0.
for the CHPX, bidi text boldness is
opposite of the boldness of the style's
CHP when 1; same as style when 0.
1 1 fItalicBi int :1 2000 CHP: bidi text is italic when 1, not italic
when 0
CHPX: opposite of style when 1, same as
style when 0.
1 1 fBiDi int :1 4000 BIDI run when 1, latin run when 0
1 1 fDiacUSico int :1 8000 diacritics use latin color when 1
int :4 F000 reserved
2 2 fsIco int :1 0001 CHP: ignored
CHPX: paragraph chp.ico contents are
different than the style CHPs contents.
fsFtc int :1 0002 CHP: ignored
CHPX: chp.ftc is different
fsHps int :1 0004 CHP: ignored
CHPX: chp.hps is different
fsKul int :1 0008 CHP: ignored
CHPX: chp.kul is different
fsPos int :1 0010 CHP: ignored
CHPX: chp.hpsPos is different
fsSpace int :1 0020 CHP: ignored
CHPX: chp.qpsSpace is different
fsLid int :1 0040 CHP: ignored
CHPX: chp.lid is different
fsIcoBi int :1 0080 CHP: ignored
CHPX: paragraph chp.icoBi contents are
different than the style CHPs contents.
fsFtcBi int :1 0100 CHP: ignored
CHPX: chp.ftcBi is different
fsHpsBi int :1 0200 CHP: ignored
CHPX: chp.hpsBi is different
fsLidBi int :1 0400 CHP: ignored
CHPX: chp.lidBi is different
int :5 F800 CHP: ignored
int :9 FF80 CHP: ignored
4 4 ftc WORD font code
6 6 hps WORD font size in half points
1, 62 = -2,...,57 = -7)
fSysVanish int :1 0040 used by Word internally, not stored in file
chp.fNumRun int :1 0080 numbers run when 1
wSpare2 int :1 0080 reserved
9 9 ico int :5 1F00 color of text:
0 Auto 9 DkBlue
1 Black 10 DkCyan
2 Blue 11 DkGreen
3 Cyan 12 DkMagenta
4 Green 13 DkRed
5 Magenta 14 DkYellow
6 Red 15 DkGray
7 Yellow 16 LtGray
8 White
kul int :3 E000 underline code:
0 none
1 single
2 by word
3 double
4 dotted
10 A hpsPos BYTE position in half points; 0 for normal;
positive for superscript; negative for
subscripts (2's compliment signed
number; 256 - hpsPos is the absolute
value for negative numbers)
11 B icoBi BYTE color of Bidi text. values same as chp.ico
11 B wSpare3 BYTE reserved
12 C lid LID language identification code (see
following table)
14 E ftcBi WORD bidi font code
16 10 hpsBi WORD bidi font size in half points
18 12 lidBi LID bidi language identification code (see
following table)
20 14 fcPic FC when character is a picture character
(character is 0x01 and chp.fSpec is 1)
20 14 fcObj FC when character is an object character
(character is 0x20 and chp.fSpec is 1)
23 17 fnPic BYTE used by Word internally.
20 14 hpsLargeChp int
14 E fcPic FC when character is a picture character
(character is 0x01 and chp.fSpec is 1)
14 E fcObj FC when character is an object character
(character is 0x20 and chp.fSpec is 1)
17 11 fnPic BYTE used by Word internally.
14 E hpsLargeChp int
sizeof(CHP) == 18 == 0x12.
sizeof(CHP) == 12 == 0xC.
Language Name LID Language Name LID Language Name LID
Albanian 0x041c French 0x040c Norwegian - Nynorsk 0x0814
Arabic 0x0401 French, Belgian 0x080c Polish 0x0415
Chinese, Traditional 0x0404 German, Swiss 0x0807 Romanian 0x0418
Chinese, Simplified 0x0804 Greek 0x0408 Russian 0x0419
Croato-Serbian 0x041a Hebrew 0x040d Serbo-Croatian 0x081a
(Latin) (cyrillic)
Czech 0x0405 Hungarian 0x040e Slovak 0x041b
Danish 0x0406 Icelandic 0x040f Spanish, Castilian 0x040a
Dutch 0x0413 Italian 0x0410 Spanish, Mexican 0x080a
Dutch, Belgian 0x0813 Italian, Swiss 0x0810 Swedish 0x041d
English, Australian 0x0c09 Japanese 0x0411 Thai 0x041e
English, U.K. 0x0809 Korean 0x0412 Turkish 0x041f
English, U.S. 0x0409 Norwegian - Bokmal 0x0414 Urdu 0x0420
Finnish 0x040b
CHP10/CHPX: Character Properties for Word for Windows 1.0
Dec Hex field type size bitfield comment
0 0 fBold int :1 0001 for the CHP, text is bold when 1 , and not
bold when 0.
for the CHPX, text boldness is opposite of
the boldness of the style's CHP when 1;
same as style when 0.
fItalic int :1 0002 CHP: italic when 1, not italic when 0
CHPX: opposite of style when 1, same as
style when 0.
fStrike int :1 0004 CHP: displayed with strikethrough when 1,
no strikethrough when 0
CHPX: opposite of style when 1, same as
style when 0.
fOutline int :1 0008 CHP: outlined when 1, not outlined when
0
CHPX: opposite of style when 1, same as
style when 0.
fFldVanish int :1 0010 <needs work>
fSmallCaps int :1 0020 CHP: displayed with small caps when 1,
no small caps when 0
CHPX: opposite of style when 1, same as
style when 0.
fCaps int :1 0040 CHP: displayed with caps when 1, no caps
when 0
CHPX: opposite of style when 1, same as
style when 0.
fVanish int :1 0080 CHP: vanished when 1, not vanished
when 0
CHPX: opposite of style when 1, same as
style when 0.
1 1 fRMark int :1 0100 <needs work>
fSpec int :1 0200 CHP: character is a Word special
character when 1, not a special character
when 0
CHPX: opposite of style when 1, same as
style when 0.
fsFtc int :1 0800 CHP: ignored
CHPX: chp.ftc is different
fsHps int :1 1000 CHP: ignored
CHPX: chp.hps is different
fsKul int :1 2000 CHP: ignored
CHPX: chp.kul is different
fsPos int :1 4000 CHP: ignored
CHPX: chp.hpsPos is different
fsSpace int :1 8000 CHP: ignored
CHPX: chp.qpsSpace is different
2 2 ftc uns font code
4 4 hps uns char font size in half points
5 5 hpsPos uns char position in half points: 0 for normal;
positive for superscript; negative for
subscripts (2's complement signed
number; 256 - hpsPos is the absolute
value for negative numbers)
6 6 qpsSpace int :6 003F space following the character in quarter
point units (range -7 through +56 qp's;
represented in excess-56 notation: 63 = -
1, 62 = -2,...,57 = -7)
wSpare2 int :2 00C0 reserved
ico int :4 0F00 color of text:
0 Black
1 Blue
2 Cyan
3 Green
4 Magenta
5 Red
6 Yellow
7 White
kul int :3 7000 underline code:
0 none
1 single
2 by word
3 double
4 dotted
fSysVanish int :1 8000 used by Word internally, not stored in file
8 8 fcPic FC when character is a picture or hand-
annotation character (character is 0x01 or
0x07 and chp.fSpec is 1)
11 B fnPic uns char used by Word internally.
8 8 hpsLargeChp int
sizeof(CHP) == 12 == 0xC.
DOP: Document Properties
Dec Hex field type size bitfield default value comment
0 0 fFacingPages int :1 0001 0 1 when facing pages should
be printed
fPMHMainDoc int :1 0004 0 1 when doc is a main doc
for Print Merge Helper, 0
when not; default = 0
grfSuppression int :2 0018 0 Default line suppression
storage; 0= form letter line
suppression; 1= no line
suppression; default = 0
fpc int :2 0060 1 footnote position code
0 print as endnotes
1 print at bottom of page
2 print immediately
beneath text
int :1 0080 0 unused
1 1 grpfIhdt int :8 FF00 0 specification of document
headers and footers. See
explanation under Headers
and Footers topic.
2 2 fFtnRestart int :1 0001 1 == 1 when footnote number
is to be reset to 1 for each
page
nFtn int :15 FFFE 1 initial footnote number for
document
4 4 irmBar BYTE 00FF
5 5 irmProps int :7 7F00
fRevMarking int :1 8000
6 6 fBackup int :1 0001 always make backup when
document saved when 1.
fExactCWords int :1 0002
fPagHidden int :1 0004
fPagResults int :1 0008
fLockAtn int :1 0010
fMirrorMargins int :1 0020 swap margins on left/right
pages when 1.
fKeepFileFormat int :1 0040 save as original file format
when 1
fDfltTrueType int :1 0080 Use TrueType fonts by
default
7 7 fPagSuppressTopSpacing int :1 0100
fRTLAlignment int :1 0200 Document is RTL if 1
int :6 FC00
int :7 FE00
8 8 fSpares int :16 FFFF
10 A dxaTab uns 720 twips default tab width
12 C ftcDefaultBi uns index to default font in sttb
12 C wSpare uns
14 E dxaHotZ uns
16 10 wSpare2 uns
18 12 wSpare3 uns reserved
20 14 dttmCreated DTTM
24 18 dttmRevised DTTM
28 1C dttmLastPrint DTTM
32 20 nRevision int
46 2E cPg int
48 30 rgwSpareDocSum int[2]
DTTM: Date and Time (internal date format)
Dec Hex field type size bitfield comment
0 0 mint unsigned :6 003F minutes (0-59)
hr unsigned :5 07C0 hours (0-23)
dom unsigned :5 F800 days of month (1-31)
2 2 mon unsigned :4 000F months (1-12)
yr unsigned :9 1FF0 years (1900-2411)-1900
wdy unsigned :3 E000 weekday, Sunday = 0, Monday = 1, Tuesday
= 2, Wednesday = 3, Thursday = 4, Friday = 5,
Saturday = 6
sizeof(DTTM) == 4.
FIB: File Information Block
Dec Hex field type size bitfield comment
0 0 wIdent uns magic number (added values for Bidi)
2 2 nFib uns FIB version written (added value for Bidi)
4 4 nProduct uns product version written by
6 6 lid uns language stamp---localized version;
In Word for Windows 1.x files this value
was the nLocale. If value is < 999, then it
is the nLocale, otherwise it is the lid.
8 8 pnNext PN
10 A fDot uns :1 0001
fGlsy uns :1 0002
fComplex uns :1 0004 when 1, file is in complex, fast-saved
format.
fHasPic uns :1 0008 file contains 1 or more pictures
cQuickSaves uns :4 00F0 count of times file was quicksaved
11 B fEncrypted uns :1 0100 1 if file is encrypted, 0 if not
uns :7 FF00 unused
12 C nFibBack uns new values for Bidi
14 E Spare long reserved
18 12 rgwSpare0 uns[3] reserved
24 18 fcMin FC file offset of first character of text. In non-
complex files a CP can be transformed
into an FC by the following transformation:
fc = cp + fib.fcMin.
28 1C fcMac FC file offset of last character of text in
document text stream + 1
32 20 cbMac FC file offset of last byte written to file + 1.
36 24 fcSpare0 FC reserved
40 28 fcSpare1 FC reserved
44 2C fcSpare2 FC reserved
48 30 fcSpare3 FC reserved
52 34 ccpText CP length of main document text stream
56 38 ccpFtn CP length of footnote subdocument text
stream
stream
Note: when ccpFtn == 0 and ccpHdr == 0
and ccpMcr == 0 and ccpAtn == 0, then
fib.fcMac = fib.fcMin+ fib.ccpText. If either
ccpFtn != 0 or ccpHdd != 0 or ccpMcr ==
0 or ccpAtn == 0, then fib.fcMac =
fib.fcMin + fib.ccpText + fib.ccpFtn +
fib.ccpHdd + ccpMcr + ccpAtn + 1. The
two characters stored beginning at file
position fib.fcMac - 2 must always be a
CRLF pair(ASCII 13, ASCII 10).
72 48 ccpSpare0 CP reserved
76 4C ccpSpare1 CP reserved
80 50 ccpSpare2 CP reserved
84 54 ccpSpare3 CP reserved
88 58 fcStshfOrig FC file offset of original allocation for STSH in
file. During fast save Word will attempt to
reuse this allocation if STSH is small
enough to fit.
92 5C cbStshfOrig uns count of bytes of original STSH allocation
94 5E fcStshf FC file offset of STSH in file.
98 62 cbStshf uns count of bytes of current STSH allocation
100 64 fcPlcffndRef FC file offset of footnote reference PLC. CPs
in PLC are relative to main document text
stream and give location of footnote
references. The structure stored in this
plc, called the FRD (footnote reference
descriptor) is two byte long.
104 68 cbPlcffndRef uns count of bytes of footnote reference PLC
== 0 if no footnotes defined in document.
106 6A fcPlcffndTxt FC file offset of footnote text PLC. CPs in PLC
are relative to footnote subdocument text
stream and give location of beginnings of
footnote text for corresponding references
recorded in plcffndRef. No structure is
stored in this plc. There will just be n+1
FC entries in this PLC when there are n
footnotes
110 6E cbPlcffndTxt uns count of bytes of footnote text PLC.
== 0 if no footnotes defined in document
112 70 fcPlcfandRef FC file offset of annotation reference PLC.
116 74 cbPlcfandRef uns
118 76 fcPlcfandTxt FC file offset of annotation text PLC.
122 7A cbPlcfandTxt uns
124 7C fcPlcfsed FC file offset of section descriptor PLC. CPs
in PLC are relative to main document. The
length of the SED is 6 bytes.
128 80 cbPlcfsed uns count of bytes of section descriptor PLC.
8 bytes.
134 86 cbPlcfpgd uns count of bytes of page descriptor PLC.
==0 if file was never repaginated. Should
not be written by third party creators of
Word files.
136 88 fcPlcfphe FC file offset of PLC of paragraph heights.
CPs in PLC are relative to main document
text stream. Only written for files in
complex format. Should not be written by
third party creators of Word files. The PHE
is 6 bytes long.
140 8C cbPlcfphe uns count of bytes of paragraph height PLC.
==0 when file is non-complex.
142 8E fcSttbfglsy FC file offset of glossary string table
146 92 cbSttbfglsy uns count of bytes of glossary string table.
== 0 for non-glossary documents.
!=0 for glossary documents.
148 94 fcPlcfglsy FC file offset of glossary PLC. CPs in PLC are
relative to main document and mark the
beginnings of glossary entries and are in
1-1 correspondence with entries of
sttbfglsy. No structure is stored in this
PLC. There will be n+1 FC entries in this
PLC when there are n glossary entries.
152 98 cbPlcfglsy uns count of bytes of glossary PLC.
== 0 for non-glossary documents.
!=0 for glossary documents.
154 9A fcPlcfhdd FC byte offset of header PLC. CPs are
relative to header subdocument and mark
the beginnings of individual headers in the
header subdocument. No structure is
stored in this PLC. There will be n+1 FC
entries in this PLC when there are n
headers stored for the document.
158 9E cbPlcfhdd uns count of bytes of header PLC.
== 0 if document contains no headers
160 A0 fcPlcfbteChpx FC file offset of character property bin
table.plc. FCs in PLC are file offsets.
Describes text of main document and all
subdocuments. The BTE is 2 bytes long.
164 A4 cbPlcfbteChpx uns count of bytes of character property bin
table PLC.
166 A6 fcPlcfbtePapx FC file offset of paragraph property bin
table.plc. FCs in PLC are file offsets.
Describes text of main document and all
subdocuments. The BTE is 2 bytes long.
172 AC fcPlcfsea FC file offset of PLC reserved for private use.
The SEA is 6 bytes long.
176 B0 cbPlcfsea uns count of bytes of private use PLC.
178 B2 fcSttbfffn FC
182 B6 cbSttbfffn uns
184 B8 fcPlcffldMom FC
188 BC cbPlcffldMom uns
190 BE fcPlcffldHdr FC
194 C2 cbPlcffldHdr uns
196 C4 fcPlcffldFtn FC
200 C8 cbPlcffldFtn uns
202 CA fcPlcffldAtn FC
206 CE cbPlcffldAtn uns
208 D0 fcPlcffldMcr FC
212 D4 cbPlcffldMcr uns
214 D6 fcSttbfbkmk FC
218 DA cbSttbfbkmk uns
220 DC fcPlcfbkf FC
224 E0 cbPlcfbkf uns
226 E2 fcPlcfbkl FC
230 E6 cbPlcfbkl uns
232 E8 fcCmds FC
236 EC cbCmds uns
238 EE fcPlcmcr FC
242 F2 cbPlcmcr uns
244 F4 fcSttbfmcr FC
248 F8 cbSttbfmcr uns
250 FA fcPrDrvr FC file offset of the printer driver information
(names of drivers, port, etc.)
254 FE cbPrDrvr uns count of bytes of the printer driver
information (names of drivers, port, etc.)
256 100 fcPrEnvPort FC file offset of the print environment in
portrait mode.
260 104 cbPrEnvPort uns count of bytes of the print environment in
portrait mode.
262 106 fcPrEnvLand FC file offset of the print environment in
landscape mode.
268 10C fcWss FC file offset of Window Save State data
structure. WSS contains dimensions of
document's main text window and the last
selection made by Word user.
272 110 cbWss uns count of bytes of WSS. ==0 if unable to
store the window state. Should not be
written by third party creators of Word
files.
274 112 fcDop FC file offset of document property data
structure.
278 116 cbDop uns count of bytes of document properties.
280 118 fcSttbfAssoc FC
284 11C cbSttbfAssoc uns
286 11E fcClx FC file of offset of beginning of information for
complex files. Consists of an encoding of
all of the prms quoted by the document
followed by the plcpcd (piece table) for
the document.
290 122 cbClx uns count of bytes of complex file information.
== 0 if file is non-complex.
292 124 fcPlcfpgdFtn FC file offset of page descriptor PLC for
footnote subdocument. CPs in PLC are
relative to footnote subdocument. Should
not be written by third party creators of
Word files.
296 128 cbPlcfpgdFtn uns count of bytes of page descriptor PLC for
footnote subdocument.
==0 if document has not been paginated.
The length of the PGD is 8 bytes.
298 12A fcAutosaveSource FC file offset of the name of the original file.
fcAutosaveSource and cbAutosaveSource
should both be 0 if autosave is off.
302 12E cbAutosaveSource uns count of bytes of the name of the original
file.
304 130 fcSpare5 FC
308 134 cbSpare5 uns
310 136 fcSpare6 FC
314 13A cbSpare6 uns
316 13C wSpare4 int
318 13E pnChpFirst PN
320 140 pnPapFirst PN
322 142 cpnBteChp PN count of CHPX FKPs recorded in file. In
non-complex files if the number of entries
in the plcfbteChpx is less than this, the
plcfbteChpx is incomplete.
plcfbtePapx is incomplete.
Note: If a table does not exist in the file, its cb in the FIB is zero and its fc is equal to that of the following table (the
latter equality is irrelevant, as the cb should be used to determine existence of the table).
FKP: Formatted Disk Page
offset (Dec) field type comments
0 rgfc array of FCs For CHPX FKPs. each FC is the limit FC
of a run of exception text.
For PAPX FKPs, each FC is the limit FC
of a paragraph (i.e. points to the next
character past an end of paragraph mark).
4 * (fkp.crun + 1) rgb array of bytes an array of bytes where each byte is the
word offset of a CHPX or PAPX. For
CHPXs, if the byte stored is 0, there is no
difference between run's character
properties and the style's character
properties. For PAPXs, if the byte stored
is 0, this represents a 1 line paragraph 15
pixels high with Normal style (stc == 0)
whose column width is 7980 dxas.
5 * fkp.crun + 4 unused space As new runs/paragraphs are recorded in
the FKP, unused space is reduced by 5 if
CHPX/PAPX is already recorded and is
reduced by 5 + sizeof(CHPX/PAPX) if
property is not already recorded.
for CHPX FKPs:
511-sizeof(grpchpx) grpchpx array of bytes grpchpx consists of all of the CHPXs
stored in FKP concatenated end to end.
Each CHPX is prefixed with a count of
bytes which records its length.
for PAPX FKPs:
511-sizeof(grppapx) grppapx array of bytes grppapx consists of all of the PAPXs
stored in FKP concatenated end to end.
Each PAPX begins with a count of words
which records its length padded to a word
boundary.
511 crun byte count of runs for CHPX FKP, count of paragraphs for PAPX FKP.
The PAP is never stored in a Word file. It is derived by expanding stored PAPXs.
FLD: Field Descriptor
Dec Hex field type size bitfield comments
0 0 ch int 7 type of field boundary the FLDdescribes.
19 field begin mark
20 field separator
21 field end mark
fDirty int :1
variant used when fld.ch == 21 (field end mark)
1 1 fDiffer int :1 01 ignored for saved file
int :1 02 reserved
fResultDirty int :1 04 == 1, when user has edited or formatted
the result. ==0 otherwise
fResultEdited int :1 08 ==1, when user has inserted text into or
deleted text from the result.
fLocked int :1 10 ==1, when field is locked from recalc
fPrivateResult int :1 20 ==1, whenever the result of the field is
never to be shown.
fNested int :1 40 ==1, when field is nested within another
field
int :1 80 reserved
sizeof(FLD) == 2.
flt Field Type flt Field Type
1 unknown keyword 32 quote Current Time variable
2 possible bookmark (syntax matches bookmark 33 quote Current Page variable
name)
3 bookmark reference 34 evaluate expression
4 index entry 35 insert literal text
5 footnote reference 36 Include command (Print Merge)
6 Set command (for Print Merge) 37 page reference
7 If command (for Print Merge) 38 Ask command (Print Merge)
8 create index 39 Fillin command to display prompt (Print Merge)
9 table of contents entry 40 Data command (Print Merge)
10 Style reference 41 Next command (Print Merge)
11 document reference 42 NextIf command (Print Merge)
12 sequence mark 43 SkipIf (Print Merge)
13 create table-of-contents 44 inserts number of current Print Merge record
14 quote Info variable 45 DDE reference
15 quote Titlevariable 46 DDE automatic reference
16 quote Subjectvariable 47 Inserts Glossary Entry
17 quote Author variable 48 sends characters to printer without translation
18 quote Keywords variable 49 Formula definition
19 quote Comments variable 50 Goto Button
20 quote Last Revised By variable 51 Macro Button
21 quote Creation Date variable 52 insert auto numbering field in outline format
22 quote Revision Date variable 53 insert auto numbering field in legal format
23 quote Print Date variable 54 insert auto numbering field in Arabic number
format
24 quote Revision Number variable 55 reads a TIFF file
25 quote Edit Time variable 56 Link
26 quote Number of Pages variable 57 Symbol
27 quote Number of Words variable 58 Embedded Object
28 quote Number of Characters variable 59 Merge fields
29 quote File Name variable 60 User Name
30 quote Document Template Name variable 61 User Initial
31 quote Current Date variable 62 User Address
0 0 lcb long length of object (including this header)
4 4 cbHeader int length of this header (for future use)
6 6 icf int index to clipboard format of object
sizeof(OBJHEADER) == 8.
PAP: Paragraph Properties
Dec Hex field type size bitfield comments
0 0 stc uns char style code. This is an index into the STSH
structure
1 1 jc uns char Justification Code
0 left justify
1 center
2 right justify
3 left and right justify
2 2 fSideBySide uns char side-by-side paragraph
3 3 fKeep uns char keep entire paragraph on one page if
possible
4 4 fKeepFollow uns char keep paragraph on same page with next
paragraph if possible
5 5 fPageBreakBefore uns char start this paragraph on new page
6 6 fUnused int :4 000F reserved
pcVert int :2 0030 vertical position code. Specifies coordinate
frame to use when paragraphs are
absolutely positioned.
0 vertical position coordinates are
relative to margin
1 coordinates are relative to page
2 coordinates are relative to text. This
means: relative to where the next non-
APO text would have been placed if this
APO did not exist.
pcHorz int :2 00C0 horizontal position code. Specifies
coordinate frame to use when paragraphs
are absolutely positioned.
0 horiz. position coordinates are
relative to column.
1 coordinates are relative to margin
2 coordinates are relative to page
/* the brcp and brcl fields have been superseded by the newly defined brcLeft, brcTop, etc. fields. They
remain in the PAP for compatibility with MacWord 3.0 */
7 7 brcp uns char rectangle border codes
0 none
1 border above
2 border below
15 box around
16 bar to left of paragraph
8 8 brcl uns char border line style
0 single
1 thick
2 double
3 shadow
line numbering)
12 C dxaRight int indent from right margin (signed).
14 E dxaLeft int indent from left margin (signed)
16 10 dxaLeft1 int first line indent; signed number relative to
dxaLeft
18 12 dyaLine int height of line. When 0, Word will
automatically allocate space to each line
so that every character is completely
visible. If positive, Word will set line
heights so that every line is at least
dyaLine dyas high. If negative, the height
of each line of the paragraph will be set
equal to the absolute value of dyaLine.
20 14 dyaBefore uns vertical spacing before paragraph
(unsigned)
22 16 dyaAfter uns vertical spacing after paragraph
(unsigned)
24 18 phe PHE height of current paragraph.
30 1E fInTable char when 1, paragraph is contained in a table
row
31 1F fTtp char when 1, paragraph consists only of the
row mark special character and marks the
end of a table row.
32 20 ptap TAP * used internally by Word
34 22 dxaAbs int when positive, is the horizontal distance
from the reference frame specified by
pap.pcHorz. 0 means paragraph is
positioned at the left with respect to the
reference frame specified by pcHorz.
Certain negative values have special
meaning:
-4 paragraph centered horizontally
within reference frame
-8 paragraph adjusted right within
reference frame
-12 paragraph placed immediately
inside of reference frame
-16 paragraph placed immediately
outside of reference frame
36 24 dyaAbs int when positive, is the vertical distance from
the reference frame specified by
pap.pcVert. 0 means paragraph's y-
position is unconstrained. .
Certain negative values have special
meaning:
-4 paragraph is placed at top of
reference frame
-8 paragraph is centered vertically
within reference frame
-12 paragraph is placed at bottom of
reference frame.
40 28 brcTop BRC specification for border above paragraph
42 2A brcLeft BRC specification for border to the left of
paragraph
44 2C brcBottom BRC specification for border below paragraph
46 2E brcRight BRC specification for border to the right of
paragraph
48 30 brcBetween BRC specification of border to place between
conforming paragraphs. Two paragraphs
conform when both have borders, their
brcLeft and brcRight matches, their widths
are the same, they both belong to tables
or both do not, and have the same
absolute positioning props.
50 32 brcBar BRC specification of border to place on outside
of text when facing pages are to be
displayed.
52 34 dxaFromText int horizontal distance to be maintained
between an absolutely positioned
paragraph and any non-absolute
positioned text
54 36 dyaFromText int vertical distance to be maintained
between an absolutely positioned
paragraph and any non-absolute
positioned text
56 38 wr byte Wrap Code for absolute objects
57 39 zz byte Reserved; currently unused
58 3A fTransparent byte Reserved, currently unused
59 3b fBiDi byte RTL paragraph when 1
59 3B bSpare byte Reserved
60 3C dyaHeight int :15 7FFF height of abs obj; 0 == Auto
fMinHeight int :1 8000 0 = Exact, 1 = At Least
62 3E shd SHD shading
64 40 itbdMac int number of tabs stops defined for
paragraph. Must be >= 0 and <= 50.
66 42 rgdxaTab int[itbdMax] array of positions of itbdMac tab stops.
itbdMax == 50
166 A6 rgtbd char[itbdMax] array of itbdMac tab descriptors
sizeof(PAP) == 216 == 0xD8.
PAPX: Paragraph Property Exceptions
The PAPX is stored within FKPs and within the STSH.
Dec Hex field type size bitfield comments
0 0 cw byte count of words of following data in PAPX.
The first byte of a PAPX is a count of
words when PAPX is stored in an FKP.
Count of words is used because PAPX in
an FKP can contain paragraph and table
sprms.
Count of bytes is used because only
paragraph sprms are stored in a STSH
PAPX.
1 1 stc byte style code of the style from which the
paragraph inherits its paragraph and
character properties
2 2 phe PHE encoding of paragraph height information
for paragraph.
8 8 grpprl character array a list of the sprms that encode the
differences between PAP for a paragraph
and the PAP for the style used. When a
paragraph bound is also the end of a table
row, the PAPX also contains a list of table
sprms which express the difference of
table row's TAP from an empty TAP that
has been cleared to zeros. The table
sprms are recorded in the list after all of
the paragraph sprms. See Sprms
definitions for list of sprms that are used in
PAPXs.
papx.cw is equal to (8 + sizeof(grpprl) + 1) / 2. If the size of the grpprl is odd, a byte of zero is stored immediately after
the grpprl to pad the PAPX so its length in bytes is papx.cw * 2.
PCD: Piece Descriptor
Dec Hex field type size bitfield comment
0 0 fNoParaLast int :1 0001 when 1, means that piece contains no end
of paragraph marks.
fPaphNil int :1 0002 used internally by Word
* int :6
1 1 fn uns char used internally by Word
2 2 fc FC file offset of beginning of piece. The size
of the ith piece can be determined by
subtracting rgcp[i] of the containing
plcfpcd from its rgcp[i+1].
6 6 prm PRM contains either a single sprm or else an
index number of the grpprl which contains
the sprms that modify the properties of the
piece.
8 8 cbPCD
PGD: Page Descriptor
Dec Hex field type size bitfield comments
0 0 * int :5 001F
fGhost int :2 0060 redefine fEmptyPage and fAllFtn. true
when blank page or footnote only page
* int :9 FF10
0 0 fContinue int :1 0001 1 only when footnote is continued from
previous page
fRight int :1 0008 1 when right hand side page
fPgnRestart int :1 0010 1 when page number must be reset to 1.
fEmptyPage int :1 0020 1 when section break forced page to be
empty.
fAllFtn int :1 0040 1 when page contains nothing but
footnotes
* int :1 0080
bkc int :8 FF00 section break code
2 2 lnn uns line number of first line, -1 if no line
numbering
4 4 cl int count of lines into paragraph for first line.
6 6 pgn uns page number as printed
8 8 dcpDepend int number of characters at the beginning of
the next page that were considered for
inclusion on current page before page
break was forced.
sizeof(PGD) == 10 == 0xA.
PHE: Paragraph Height
The PHE is a substructure of the PAP and PAPX and is also stored in the PLCFPHE.
Dec Hex field type size bitfield comments
0 0 fSpare int :1 0001 reserved
fUnk int :1 0002 phe entry is invalid when == 1
fDiffLines int :1 0004 when 1, total height of paragraph is known
but lines in paragraph have different
heights.
* int :5 00F8 reserved
clMac int :8 FF00 when fDiffLines is 0 is number of lines in
paragraph
2 2 dxaCol int width of lines in paragraph
4 4 dylLine int when fDiffLines is 0, is height of every line
in paragraph is in pixels
4 4 dylHeight uns when fDiffLines is 1, is the total height in
pixels of the paragraph
4 4 fStyleDirty int when PAPXs are stored in STSH, this
indicates that the style containing this
PAPX has changed so paragraph height
information stored for paragraphs with this
style are invalid.
sizeof(PHE) == 6.
If there is no paragraph height information stored for a paragraph, all of the fields in the PHE are set to 0. If a paragraph
contains more than 127 lines, the clMac, dylLine variant cannot be used, so fDiffLines must be set to 1 and the total size
of the paragraph stored in dylHeight. If a paragraph height is greater than 32767 twips, the height cannot be represented
by a PHE so all fields of the PHE must be set to 0.
If a new Word for Windows file is created, the PHE of every PAPX created to describe the paragraphs of the file should
be set to 0. If a Word for Windows file is altered in place (a character of the file changed to a new character or a property
changed), the paragraph containing the change must have its papx.phe field set to 0.
0 0 lcb long number of bytes in the PIC structure plus
size of following picture data which may
be a Window's metafile, a bitmap, or the
filename of a TIFF file.
4 4 cbHeader unsigned number of bytes in the PIC (to allow for
future expansion).
6 6 mfp.mm int
8 8 mfp.xExt int
10 A mfp.yExt int
12 C mfp.hMF int
If a Windows metafiles is stored immediately following the PIC structure, the mfp is a Window's
METAFILEPICT structure. When the data immediately following the PIC is a TIFF filename, mfp.mm == 98 If
a bitmap is stored after the PIC, mfp.mm == 99
When the PIC describes a bitmap, mfp.xExt is the width of the bitmap in pixels and mfp.yExt is the height of
the bitmap in pixels..
14 E bm BITMAP (14 bytes) Window's bitmap structure when PIC
describes a BITMAP.
14 E rcWinMF rect (8 bytes) rect for window origin and extents when
metafile is stored -- ignored if 0
28 1C dxaGoal int horizontal measurement in twips of the
rectangle the picture should be imaged
within.
30 1E dyaGoal int vertical measurement in twips of the
rectangle the picture should be imaged
within.
when scaling bitmaps, dxaGoal and dyaGoal may be ignored if the operation would cause the bitmap to
shrink or grow by a non-power-of-two factor
32 20 mx uns horizontal scaling factor supplied by user
expressed in .001% units.
34 22 my uns vertical scaling factor supplied by user
expressed in .001% units.
For all of the Crop values, a positive measurement means the specified border has been moved inward from
its original setting and a negative measurement means the border has been moved outward from its original
setting.
36 24 dxaCropLeft int the amount the picture has been cropped
on the left in twips.
38 26 dyaCropTop int the amount the picture has been cropped
on the top in twips.
40 28 dxaCropRight int the amount the picture has been cropped
on the right in twips.
42 2A dyaCropBottom int the amount the picture has been cropped
on the bottom in twips.
44 2C brcl int :4 000F Obsolete, superseded by brcTop, etc. In
Word for Windows 1.x, it was the type of
border to place around picture
0 single
1 thick
2 double
3 shadow
fFrameEmpty int :1 0010 picture consists of a single frame
int :11 reserved
52 34 brcRight BRC specification for border to the right of
picture
54 36 dxaOrigin int horizontal offset of hand annotation origin
56 38 dyaOrigin int vertical offset of hand annotation origin
58 3A rgb variable array of bytes containing
Window's metafile, bitmap or TIFF file
filename.
PLCF: Plex of CPs stored in File
offset (in decimal) field type comment
0 rgfc FC[ ] given that the size of PLCF is cb and the size of the
structure stored in plc is cbStruct, then the number of
structure instances stored in PLCF, iMac is given by (cb -
4)/(4 + cbStruct) The number of FCs stored in the PLCF
will be iMac + 1.
4*(iMac + 1) rgstruct struct[ ] array of some arbitrary structure.
sizeof(PLC) == iMac(4 + cbStruct) + 4.
PRM: Property Modifier
The PRM has two variants. In the first variant, the PRM records a single one or two byte sprm whose opcode is less
than 128.
PRM: Property Modifier (variant 1)
Dec Hex field type size bitfield comment
0 0 fComplex int :1 0001 set to 0 for variant 1
sprm int :7 00FE sprm opcode
val int :8 FF00 sprm's second byte if necessary
In the second variant, prm.fComplex is 1, and the rest of the structure records an index to a grpprl stored in the CLX
(described in Complex File Format topic).
PRM: Property Modifier (variant 2)
Dec Hex field type size bitfield comment
0 0 fComplex int :1 0001 set to 1 for variant 2
igrpprl int :15 FFFE index to a grpprl stored in CLX portion of
file.
SED: Section Descriptor
Dec Hex field type size bitfield comments
0 0 fSwap int :1 0001 runtime flag, indicates whether orientation
should be changed before printing. 0
indicates no change, 1 indicates
orientation change.
fUnk int :1 0002 used internally by Word for Windows
fn int :14 FFFC used internally by Word for Windows
2 2 fcSepx FC file offset to beginning of SEPX stored for
section. If sed.fcSepx == 0xFFFFFFFF,
the section properties for the section are
equal to the standard SEP (see SEP
SEP: Section Properties
Dec Hex field type comments
0 0 bkc uns char break code:
0 No break
1 New column
2 New page
3 Even page
4 Odd page
1 1 fTitlePage uns char set to 1 when a title page is to be displayed
2 2 ccolM1 int number of columns in section - 1.
4 4 dxaColumns int distance that will be maintained between
columns
6 6 fFacingCol char facing columns flag
6 6 bUnused1 char reserved
7 7 nfcPgn uns char page number format code:
0 Arabic
1 Roman (upper case)
2 Roman (lower case)
3 Letter (upper case)
4 Letter (lower case)
8 8 pgnStart uns user specified starting page number.
10 A fBiDi uns flag for bidi section
10 A wSpare1 uns
12 C fPgnRestart uns char set to 1 when page numbering should be
restarted at the beginning of this section
13 D fEndNote uns char when 1, footnotes placed at end of section.
When 0, footnotes are placed at bottom of
page.
14 E lnc char line numbering code:
0 Per page
1 Restart
2 Continue
15 F grpfIhdt char specification of which headers and footers are
included in this section. See explanation in
Headers and Footers topic.
16 10 nLnnMod uns if 0, no line numbering, otherwise this is the
line number modulus (e.g. if nLnnMod is 5,
line numbers appear on line 5, 10, etc.)
18 12 dxaLnn int distance of
20 14 dyaHdrTop uns y position of top header measured from top
edge of page.
22 16 dyaHdrBottom uns y position of top header measured from top
edge of page.
24 18 fLBetween char when ==1, draw vertical lines between
columns
25 19 vjc char vertical justification code
0 top justified
1 centered
2 fully justified vertically
3 bottom justified
26 1A lnnMin int beginning line number for section
28 1C morPage uns char orientation of pages in that section. set to 0
when portrait, 1 when landscape
width of page
32 20 yaPage uns default value is 15840 twips
height of page
34 22 dxaLeft uns default value is 1800 twips
left margin
36 24 dxaRight uns default value is 1800 twips
right margin
38 26 dyaTop int default value is 1440 twips
top margin
40 28 dyaBottom int default value is 1440 twips
bottom margin
42 2A dzaGutter uns default value is 0 twips
gutter width
44 2C dmBinFirst uns bin number supplied from windows printer
driver indicating which bin the first page of
section will be printed.
46 2E dmBinOther uns bin number supplied from windows printer
driver indicating which bin the pages other
than the first page of section will be printed.
48 30 dxaColumnWidth uns used internally by Word.
sizeof (SEP) == 50 == 0x32.
The standard SEP is all zeros except:
bkc 2
dyaPgn 720 twips (equivalent to .5 in)
dxaPgn 720 twips
fEndnote True
dyaHdrTop 720 twips
dyaHdrBottom 720 points
SEPX: Section Property Exceptions
Dec Hex field type size bitfield comment
0 0 cb byte count of bytes in remainder of SEPX.
1 1 grpprl char[ ] list of sprms that encodes the differences
between the properties of a section and
Word's default section properties.
TAP: Table Properties
Dec Hex field type size bitfield comments
0 0 jc int justification code. specifies how table row
should be justified within its column.
0 left justify
1 center
2 right justify
2 2 dxaGapHalf int measures half of the white space that will
be maintained between text in adjacent
columns of a table row. A dxaGapHalf
width of white space will be maintained on
guarantees that the height of the table will
be exactly absolute value of
dyaRowHeight high. When 0, table will
be given a height large enough to
represent all of the text in all of the cells
of the table.
6 6 fCaFull int :1 0001 used internally by Word
fFirstRow int :1 0002 used internally by Word
fLastRow int :1 0004 used internally by Word
fOutline int :1 0008 used internally by Word
fBiDi int :1 0010 table orientation
* int :11 FFE0 reserved
* int :12 FFF0 reserved
8 8 itcMac int count of cells defined for this row. ItcMac
must be >= 0 and less than or equal to 32.
10 A dxaAdjust int used internally by Word
12 C rgdxaCenter int[itcMax + 1] rgdxaCenter[0] is the left boundary of cell
0 measured relative to margin..
rgdxaCenter[tap.itcMac - 1] is left
boundary of last cell.
rgdxaCenter[tap.itcMac] is right boundary
of last cell.
78 4E rgtc TC[itcMax] array of table cell descriptors
398 18E rgshd SHD[itcMax] array of cell shades
sizeof(TAP) == 462 == 0x1CE.
TBD: Tab Descriptor
The TBD is a substructure of the PAP.
Dec Hex field type size bitfield comments
0 0 jc int :3 07 justification code
0 left tab
1 centered tab
2 right tab
3 decimal tab
4 bar
tlc int :3 38 tab leader code
0 no leader
1 dotted leader
2 hyphenated leader
3 single line leader
* int :2 C0 reserved
sizeof(TBD) == 1.
TC: Table Cell Descriptors
The TC is a substructure of the TAP.
Dec Hex field type size bitfield comments
merged cells are consolidated and the text
within the cells is interpreted as belonging
to one text stream for purposes of
calculating line breaks.
fMerged int :1 0002 set to 1 when cell has been merged with
preceding cell.
fUnused int :14 FFFC reserved
2 2 brcTop BRC specification of the top border of a table
cell
4 4 brcLeft BRC specification of left border of table row
6 6 brcBottom BRC specification of bottom border of table row
8 8 brcRight BRC specification f right border of table row.
sizeof(TC) == 10 == 0xA.
Changes to Structures
BRC
The previously defined BRC is obsolete and has been renamed BRC10. A new BRC is defined with new fields
and field names.
CHP
The size of the CHP changed from 16 to 32 bits, with some spare bits added.
The fStrike, hpsPos, & fSysVanish fields were moved within the CHP. A new field, fRMarkDel, is located
where fStrike previously was.
The fsLid and lid fields were added for the language identification code. Possible values for the lid are defined
at the lid field definition.
The types of several fields were changed. The ftc field was changed from an unsigned integer to a WORD.
The hps field was changed from an unsigned char to a WORD. The fnPic field was changed from an unsigned integer to
a BYTE.
The fObj and fcObj fields were added for managing embedded objects.
DOP
The unused field fWide was removed.
The type of the irmBar field was changed from an int to a BYTE.
The spare field rgwSpare was redefined as wSpare2 and wSpare3.
New fields fPMHMainDoc, grfSuppression, fKeepFileFormat, fDfltTrueType, and fPagSuppressTopSpacing
were added.
The page dimensions and margin fields, xaPage, yaPage, dxaLeft, dxaRight, dyaTop, dyaBottom, dxaGutter,
were moved from the DOP to the SEP. The DOP dxaGutter field was renamed to dzaGutter in the SEP.
DTTM
This newly defined structure defines Word's internal date format.
FIB
The nLocale field name was changed to lid, a language identification code. If the value of this field is less than
999, it represents nLocale, otherwise it represents a lid. (Defined lid values are enumerated in the CHP structure
definition.)
The type of the wident, nfib, nproduct and lid (formerly nlocale) fields were changed from int to uns.
The fEncrypted field was added for managing file encryption.
The fcPrEnv and cbPrEnv fields were renamed fcPrDrv and cbPrDrv, respectively.
The fcPrEnvPort, cbPrEnvPort, fcPrEnvLand and cbPrEnvLand were added store information about the print
environment and page orientation.
The fcAutosaveSource and cbAutosaveSource fields were added.
The pnChpFirst and pnPapFirst fields were added.
FLD
The ch field was reduced from eight to seven bits, and a new bitfield, fDirty was added.
New field types (flt) were defined for Link, Symbol, Embedded Object, Merge, User Name, User Initial, User
Address.
OBJHEADER
This new structure defines the Embedded Object Properties.
PAP
The fields nfcSeqNumb and nnSeqNumb were added to store auto numbering information.
The fields dyaFromText, wr, dyaHeight, and fMinHeight were added to store information about frames
(Absolutely Positioned Objects). When converting 1.x documents with Absolutely Positioned Objects set the old
dxaFromText (Distance from text) to both dxaFromText and dyaFromText.
The shd field was added to store information about paragraph shading.
The size of the PAP structure has changed from 210 == 0xD0 to 216 == 0xD8.
PGD
The type of the cl field changed from uns to int, and the type of the pgn field changed from int to uns.
PIC
The brcl field is obsolete. In Word for Windows 1.x, this fields stored the type of border to place around a
Windows.
SED
The fSpare spare was changed to fSwap, a runtime flag for landscape/portrait orientation.
SEP
The page dimensions and margin fields, xaPage, yaPage, dxaLeft, dxaRight, dyaTop, dyaBottom, dxaGutter,
were moved from the DOP to the SEP. The DOP dxaGutter field was renamed to dzaGutter in the SEP. The morPage
field and the reserved bUnused2 field were added.
The fAutoPgn field was changed to bUnused1.
The dmBinFirst and dmBinOther fields were added to store information about the printer environment.
The size of the SEP structure changed from 30 == 0x1E to 50 == 0x32.
TAP
The spare fields, wSpare1, wSpare2, wSpare3, wSpare4, and wSpare5, were removed.
The rgshd array field was added at the end of the structure.
TC
The type of the rgbrc, brcTop, brcLeft, brcBottom, and brcRight fields were changed from int to BRC.
Other changes
Autosave Source
This information is written immediately after the sttbfAssoc table and appears only in autosave files.
Embedded Objects
Embedded objects are a new item in the file format. The native data for an embedded object (OBJ) is stored
similarly to pictures (PIC). Note the addition of the OBJHEADER structure.
Hand Annotation
When chp.fSpec == 1, the ASCII code 6 is a special character marking a Hand Annotation (from Pen
Windows).
New Sprm definitions
The previous Border sprms, sprmPBrcTop, sprmPBrcLeft, sprmPBrcBottom, sprmPBrcRight,
sprmPBrcBetween, sprmPBrcBar and sprmPBrcFromText, are renamed with "10" appended to each name. These sprms
now refer to the BRC10 structure. New sprms values are defined for the original names, and also for sprmPicBrcTop,
sprmPicBrcLeft, sprmPicBrcBottom, and sprmPicBrcRight, that refer to the redefined BRC structure.
New sprms for auto-numbering paragraphs are sprmPNfcSeqNumb and sprmPnoSeqNumb. Other new
paragraph property sprms are sprmPWHeightAbs, sprmPShd, sprmPDyaFromText, and sprmPDxaFromText.
New character property sprms are sprmCFStrikeRM, sprmCFRMark, sprmCFFldVanish and sprmCLid.
New section property sprms are sprmSDmBinFirst, sprmSDmBinOther, sprmSFAutoPgn, sprmSDyaPgn,
sprmSDxaPgn, and sprmSBOrientation.
The previous Table sprms sprmTDefTable and sprmTSetBrc are renamed with "10" appended to each name.
New sprm values are defined for the original names. New sprms for table cell shading are sprmTDefTableShd and
sprmTSetShd.
sttbfAssoc
Indices to the associated string table and descriptions of strings are included.
sttbfFn
The names for all fonts are explicitly included in the font name table. It is still true that ftc = 0 represents the
"best" Roman PS font on the system, ftc = 1 represents the Symbol font, and ftc = 2 represents the "best" Swiss (Sans
Serif) PS font available.
Autosave Source, 12, 14, 49 sprmPFBiDi, 23
BRC, 37 sprmPicBrc, 24
CHP, 39 sprmPNfcSeqNumb, 22
chp.fBiDi, 39 sprmPNoSeqNumb, 22
chp.fBoldBi, 39 sprmPShd, 22
chp.fDiacUSico, 39 sprmPWHeightAbs, 22
chp.fItalicBi, 39 sprmSBOrientation, 24
chp.fsFtcBi, 40 sprmSDmBinFirst, 24
chp.fsHpsBi, 40 sprmSDmBinOther, 24
chp.fsIcoBi, 40 sprmSDxaLeft, 25
chp.fsLidBi, 40 sprmSDxaPgn, 24
chp.ftcBi, 41 sprmSDxaRight, 25
chp.hpsBi, 41 sprmSDyaBottom, 25
chp.icoBi, 40 sprmSDyaPgn, 24
chp.lidBi, 41 sprmSDyaTop, 25
CHP/CHPX, 38 sprmSDzaGutter, 25
DOP, 43, 44 sprmSFAutoPgn, 24
dop.fRTLAlignment, 44 sprmSFBiDi, 24
dop.ftcDefaultBi, 44 sprmSFFacingCol, 25
DTTM, 44 sprmSFRTLGutter, 24
Embedded Object, 9, 10, 39, 41, 51 sprmSXaPage, 25
FIB, 44, 45 sprmSYaPage, 25
fib.nFib, 45 sprmTDefTable, 25, 29
fib.nFibBack, 45 sprmTDefTableShd, 25, 26, 29
fib.wIdent, 45 sprmTFBiDi, 25
FLD, 50 sprmTSetBrc, 25, 31
fNumRun, 40 sprmTSetShd, 25, 31
Hand Annotation, 15, 43, 57 sttbfAssoc, 12, 14, 36
OBJHEADER, 51 sttbfFn, 11, 13
PAP, 51, 54 TAP, 60, 61
pap.fBiDi, 54 tap.fBiDi, 61
PIC, 56 TC, 61
SED, 58
SEP, 58, 59
sep.fBiDi, 59
sep.fFacingCol, 59
sep.fRTLGutter, 59
sizeof(CHP), 41
sprmCFBiDi, 24
sprmCFBoldBi, 23
sprmCFDiacColor, 24
sprmCFFldVanish, 23
sprmCFItalicBi, 23
sprmCFRMark, 23
sprmCFStrikeRM, 23
sprmCFtcBi, 23
sprmCHpsBi, 23
sprmCIcoBi, 23
sprmCLid, 23
sprmClidBi, 23
sprmMax, 25
sprmPBrc, 22
sprmPDxaFromText, 22
5/29/93 additions for Bidi version 2.0c (by Alex Morcos)
10/25/91 Reformatted document, removed revision marks and completed the summary
of changes from Word for Windows 1.x to 2.0 format.
5/10/91 Updated structures and sprm table for Word for Windows 2.0 format.
1/23/90 Corrected offsets with the definition of the FIB
6/16/89 Updated structure definitions
1/9/89 Document Created